oschwengers / bakta

Rapid & standardized annotation of bacterial genomes, MAGs & plasmids
GNU General Public License v3.0
431 stars 51 forks source link

predict & annotate CDSs crashes #154

Closed loalon closed 1 year ago

loalon commented 1 year ago

Hi @oschwengers!

I have been using bakta in hundred of samples without an issue, but 3 of them crashed in the "predict & annotate CDSs" step.

Information about my setup:

This is the command that I run:
bakta --threads 8 --verbose --debug --db /bakta_db/db/ --prefix /NGS_bakta --output /NGS_bakta NGS.scaffolds.fasta

And this is the output I get

Bakta v1.5.1
Options and arguments:
        input: /samples/nf_bacteria_bakta_all/NFX-ARRAKIS--ID22-2P0A.6226.VTTM-JEXA.15-386--20220926.222241------KFK220923-ADD0.196227.3747B.FB5AAEC2C/output/spades/NGS.scaffolds.fasta
        db: /bakta_db/db, version 4.0
        output: /NGS_bakta
        prefix: /NGS_bakta
        tmp directory: /tmp/tmp0j2vnbgy
        # threads: 8
        debug: True
        translation table: 11

Bakta runs in DEBUG mode! Temporary data will not be destroyed at: /tmp/tmp0j2vnbgy

parse genome sequences...
        imported: 5440
        filtered & revised: 5440
        contigs: 5440

start annotation...
predict tRNAs...
        found: 68
predict tmRNAs...
        found: 1
predict rRNAs...
        found: 6
predict ncRNAs...
        found: 40
predict ncRNA regions...
        found: 35
predict CRISPR arrays...
        found: 0
predict & annotate CDSs...
        predicted: 3231
        discarded spurious: 1
        revised translational exceptions: 0
        detected IPSs: 2154
        found PSCs: 314
        found PSCCs: 228
        lookup annotations...
        conduct expert systems...
Traceback (most recent call last):
  File "/opt/conda/envs/bakta/bin/bakta", line 10, in <module>
    sys.exit(main())
  File "/opt/conda/envs/bakta/lib/python3.10/site-packages/bakta/main.py", line 261, in main
    expert_amr_found = exp_amr.search(cdss, cds_aa_path)
  File "/opt/conda/envs/bakta/lib/python3.10/site-packages/bakta/expert/amrfinder.py", line 47, in search
    raise Exception(f"amrfinder error! error code: {proc.returncode}. Please, try 'amrfinder_update --force_update --database {amrfinderplus_db_path}' to update AMRFinderPlus's internal database.")
Exception: amrfinder error! error code: 1. Please, try 'amrfinder_update --force_update --database /bakta_db/db/amrfinderplus-db' to update AMRFinderPlus's internal database.

I did use amrfinder_update --force_update and still get the same results, but taking a look at the log generated by amrfinder I saw this:

cat /tmp/tmp0j2vnbgy/amrfinderplus/amrfinder.iCwB7P/log
*** ERROR ***
"amr_report.cpp", line 1110: inFam ()
Stack:
/opt/conda/envs/bakta/bin/amr_report(+0x1e258) [0x55b773073258]
/opt/conda/envs/bakta/bin/amr_report(+0x2a451) [0x55b77307f451]
/opt/conda/envs/bakta/bin/amr_report(+0x3a013) [0x55b77308f013]
/opt/conda/envs/bakta/bin/amr_report(+0x4cacb) [0x55b7730a1acb]
/opt/conda/envs/bakta/bin/amr_report(+0xf55f) [0x55b77306455f]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xea) [0x7ff4e7b38d0a]
/opt/conda/envs/bakta/bin/amr_report(+0xfa39) [0x55b773064a39]
Use: addr2line -f -C -e /opt/conda/envs/bakta/bin/amr_report  -a <address>

HOSTNAME: 63d19fd5ad64
SHELL: ?
PWD: /tmp/tmp0j2vnbgy
PATH: /opt/conda/envs/bakta/bin:/opt/conda/condabin:/opt/conda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
Progam name:  amr_report
Command line: /opt/conda/envs/bakta/bin/amr_report -fam /bakta_db/db/amrfinderplus-db/latest/fam.tab -blastp /tmp/tmp0j2vnbgy/amrfinderplus/amrfinder.iCwB7P/blastp 
-hmmsearch /tmp/tmp0j2vnbgy/amrfinderplus/amrfinder.iCwB7P/hmmsearch -hmmdom /tmp/tmp0j2vnbgy/amrfinderplus/amrfinder.iCwB7P/dom -organism '' 
-mutation /bakta_db/db/amrfinderplus-db/latest/AMRProt-mutation.tab -susceptible /bakta_db/db/amrfinderplus-db/latest/AMRProt-susceptible.tab -pseudo 
-coverage_min 0.5 -log /tmp/tmp0j2vnbgy/amrfinderplus/amrfinder.iCwB7P/log

Do you have any insights on why this happens?

If I add to bakta the --skip-cds option, bakta will run without an issue, but for my analysis I need the CDS output.

Thank you for your time.

Alonso

oschwengers commented 1 year ago

Hi @loalon, thanks a lot for the detailed report! This is a very strange bug that just recently started to occur irregularly. Other users have faced it (#151) and we see it from time to time. Unfortunately, we still don't know what the cause of this is. Could you please also provide the Bakta log file? Maybe we get some hints. I wonder, if this bug is related to Bakta or actually to AMRFinder itself.

loalon commented 1 year ago

NGS_bakta.log Here it is. Thanks for the help!

oschwengers commented 1 year ago

Thanks. Indeed, this seems to be related to AMRFinder and not Bakta itself.

Is this error reproducible using this specific genome or does it happen irregularly? If it is reproducible, could you try to execute AMRFinder on the predicted amino acids itself and see if the issue occurs, as well?

prodigal -i /NGS_bakta output/spades/NGS.scaffolds.fasta -g 11 -c -a cds.faa
amrfinder --database /bakta_db/db/amrfinderplus-db/latest --protein cds.faa --plus --translation_table 11 --output amrfinder.tsv --threads 8
loalon commented 1 year ago

Yes, this error is reproducible and I got the same error as before. I'll report this bug in AMRFinder. If they find a solution I'll post a link it here too. Thanks for your time and your help!

oschwengers commented 1 year ago

Indeed, this issue was caused by a bug within amrfinder that occurs very seldom. Thanks @loalon for reporting this upstream: https://github.com/ncbi/amr/issues/99

It should be fixed by updating to amrfinder v3.10.42