oschwengers / bakta

Rapid & standardized annotation of bacterial genomes, MAGs & plasmids
GNU General Public License v3.0
454 stars 55 forks source link

Crispr and AMR errors #232

Closed Sumsarium closed 1 year ago

Sumsarium commented 1 year ago

I am running Bakta (via conda) v. 1.8.1 and have stumbled upon two bugs. I believe one (Crispr detection) is tied to Bakta, but I am not totally sure about the other one (AMR)

I am routinely running Bakta on hundreds of MAGs, but about 1 % of the annotations tend to fail at the Crispr or AMR step. Initially, I disabled the Crispr detection because it was not essential to my workflow and consistently aborted the annotation. I am also seeing an AMR error in about 1 % of the MAGs. I am running with GNU parallel (hundreds of MAGs)...in this case, if I manually re-run the failed MAGs annotation, then the AMR works (not the Crispr). Has anyone stumbled upon the same errors/problems?

Example command: bakta $input --db $db --compliant --skip-crispr --output $output --threads $threads

AMR-error: 05:07:12.622 - INFO - CDS - predicted=2232 05:07:12.666 - INFO - ORF - write internal aa seqs: # seqs=2232, path=/tmp/tmpsjgdgahc/cds.spurious.faa 05:07:19.452 - INFO - ORF - discard spurious: contig=contig_20, start=17038, stop=17496, strand=-, homology=Spurious_ORF_05, evalue=2.2e-60, bitscore=198.300000 05:07:19.453 - INFO - ORF - discarded=1 05:07:20.321 - INFO - UPS - looked-up=1 05:07:20.341 - INFO - IPS - looked-up=1 05:07:20.342 - INFO - ORF - write internal aa seqs: # seqs=2230, path=/tmp/tmpsjgdgahc/cds.psc.faa 05:39:32.540 - INFO - PSC - found: PSC=17, PSCC=860 05:39:33.487 - INFO - PSC - looked-up=878 05:39:34.322 - INFO - PSCC - looked-up=766 05:39:34.360 - INFO - ORF - write internal aa seqs: # seqs=2231, path=/tmp/tmpsjgdgahc/cds.expert.faa 05:40:00.855 - WARNING - EXPERT-AMRFINDER - AMR expert system failed! amrfinder-error-code=1 05:40:00.873 - INFO - MAIN - removed tmp dir: /tmp/tmpsjgdgahc

Thanks in advance! And thanks for an otherwise great tool.

oschwengers commented 1 year ago

Hi @Sumsarium , thanks for reaching out and reporting. So far, I haven't heart of any CRISPR-related errors, so that's quite new and interesting, indeed. Could you provide a reproducible example? Maybe a sample MAG for which the CRISPR detection step constantly fails? Then I could take a closer look at it.

Regarding the AMR error. Could you please try to update AMRFinder to the latest version 3.11.18 https://github.com/ncbi/amr/releases and try again if this error remains? We regularly see sporadic errors with AMRFinder, and I'm fairly sure that this is related to AMRFinder itself and not a bug within Bakta. Of course, if you can provide an example showing that this is caused by Bakta, I'll take a closer look at it, too.

Best regards!

Sumsarium commented 1 year ago

Hi @oschwengers I think that I got rid of the AMRFinder error by force-updating the database.

Regarding the CRISPR-related error:

14:31:59.904 - WARNING - CRISPR - CRISPRs failed! pilercr-error-code=-6

I have found a MAG which will output the error. I can attach it, if you can provide me with an email.

oschwengers commented 1 year ago

OK, could you please upload your genome here and send me a short notification (here)? Thanks

Sumsarium commented 1 year ago

@oschwengers I have uploaded the MAG. Thanks

oschwengers commented 1 year ago

Hi @Sumsarium, thanks for your patience.

I've now managed to take a deeper look into this. This is not a bug within Bakta itself, but within PILER-CR.

Using your genome, Piler-CR causes the following error within Python: Exception: PILER-CR error! error code: -6. A deeper look into the logs reveals the following underlying error within PILER-CR itself: *** FATAL ERROR *** GlobalPosToContig(911878): returned contig 22, out of range.

I get exactly the same error, when executing PILER-CR on its own. However, the error disappears if I execute Bakta or PILER-CR on the contig mentioned in the error msg (22, which is contig id=55). Also, the error disappears if I shuffle the contigs in your assembly, i.e. I put contig 52,55,56 in front of contig 1.

Hence, this is certainly an error within PILER-CR and thus, I unfortunately, cannot do anything here.

If the crispr detection is not important for your analysis, you could simply deactivate it via --skip-crispr.

Sumsarium commented 1 year ago

@oschwengers Thanks for looking into it. I also just deactivated the CRISPR search. Hopefully, they will fix the PILER-CR bug at some point.