oschwengers / bakta

Rapid & standardized annotation of bacterial genomes, MAGs & plasmids
GNU General Public License v3.0
438 stars 53 forks source link

Error: sequence should be at least 100000 characters #337

Open Jigyasa3 opened 6 days ago

Jigyasa3 commented 6 days ago

Dear Bakta team,

Thank you again for a great tool! I am annotating the plasmid genomes of interest, and some of them are shorter than 100000 characters which is giving the following error-

/opt/conda/lib/python3.8/site-packages/bakta/features/cds.py:42: UserWarning: sequence should be at least 100000 characters (29439 found)

But if I annotate a short plasmid genome using the BAKTA web tool, it works. Is there a length cutoff applied to the standalone version that we can change to annotate small plasmid genomes?

Looking forward to your reply, Regards Jigyasa

oschwengers commented 5 days ago

Hi @Jigyasa3, thanks for asking. If I don't miss anything, then this is actually not a proper error, but just a warning coming from Pyrodigal. If this hasn'T been changed in the meantime, there is a hard-coded value of 20,000 bp (https://github.com/althonos/pyrodigal/blob/5508c99e5618b526124ce197273cb7f019781b91/pyrodigal/lib.pyx#L182) as an absolute lower sequence length limit. Below, Pyrodigal will throw a proper error and thus, Bakta will execute Pyrodigal in meta mode for sequences (plasmids) shorter than this threshold. For sequences between 20 kbp and 100 kbp, Pyrodigal is able to run in normal mode, but prints this warning that it might be too few sequence information in order to train its internal models. Hence, in some cases, you might end up with better gene predictions using Bakta's meta mope, which forwards this to Pyrodigal internally. But it is very hard to know which one works best a priori.

For sequence larger than 100 kbp, Pyrodigal can train its internal models sufficiently.

Coming back to the actual question: No, there is no parameter in Bakta to adjust this threshold, since this is hard-coded and cannot be set in Pyrodigal. BUT, you can just execute Bakta and ignore this warning. The web version is a mere wrapper around the CLI version but does not show this warning.

Best regards!