Closed xvazquezc closed 2 years ago
Yes, absolutely, I don't think it should matter that much whether a STOP codon is there or not, given that Prodigal/pyrodigal
is going to emit one in the default ORF finder. I've made a patch for the next release so that GECCO trusts the GenBank annotations and doesn't try to fiddle that much with them.
The --locus-tag
flag is doing exactly its job though, by default it's using locus_tag
qualifiers to extract the genes, so if you have several CDS in a gene then you're bound to have a problem. However I'd rather detect it earlier while loading the data than having duplicate IDs be the cause of random bugs later in the pipeline :yum:
Hi there,
I've been running GECCO on a few annotated genomes with
--cds-feature CDS
and sideloaded them in antiSMASH without problems. However, there is one genome with multiple CDS in several genes so I had to add the--locus-tag protein_id
so GECCO doesn't crash with something like this:Once I add
--locus-tag protein_id
, it runs without problems. However, it made antiSMASH crash while sideloading:I took a look at the GECCO output and those are indeed the coordinates reported in the
*.sideload.json
and the*.features.tsv
. I then inspected the sourcegbk
file and I found out that the coordinates for the reported proteins are actually 3 nt before the actual protein coordinates, i.e. 53301 instead of 53304.These are the features reported in the range that antiSMASH complains:
And these are the corresponding annotations from the source
gbk
:My guess is that it is an issue with L195 in
orf.py
, which assumes the presence of stop codon and removes it.Regards