ncbi / amr

AMRFinderPlus - Identify AMR genes and point mutations, and virulence and stress resistance genes in assembled bacterial nucleotide and protein sequence.
https://www.ncbi.nlm.nih.gov/pathogens/antimicrobial-resistance/AMRFinder/
Other
271 stars 39 forks source link

Core dump when running multi-threaded on certain fa inputs #145

Open jchorl opened 4 months ago

jchorl commented 4 months ago

Hi,

Thanks for the hard work on this tool.

I managed to hit a core dump when amrfinder runs blastn when running in multi-threaded mode.

Here are the logs:

(amrtest) root@8a9053f3c7ce:/work# amrfinder -n Klebsiella+oxytoca.fna --threads 4 --organism Klebsiella_oxytoca
Running: amrfinder -n Klebsiella+oxytoca.fna --threads 4 --organism Klebsiella_oxytoca
Software directory: '/opt/conda/envs/amrtest/bin/'
Software version: 3.12.8
Database directory: '/opt/conda/envs/amrtest/share/amrfinderplus/data/2024-05-02.2'
Database version: 2024-05-02.2
AMRFinder translated nucleotide and mutation search
Running blastx
Running blastn

*** ERROR ***
'/opt/conda/envs/amrtest/bin/blastn'  -query 'Klebsiella+oxytoca.fna' -db /tmp/amrfinder.SVXjTS/db/AMR_DNA-Klebsiella_oxytoca -evalue 1e-20  -dust no  -max_target_seqs 10000    -num_threads 2  -mt_mode 1 -outfmt '6 qseqid sseqid qstart qend qlen sstart send slen qseq sseq' -out /tmp/amrfinder.SVXjTS/blastn > /tmp/amrfinder.SVXjTS/log 2> /tmp/amrfinder.SVXjTS/blastn-err
status = 34304
terminate called after throwing an instance of 'ncbi::CCoreException'
terminate called recursively
Aborted (core dumped)

HOSTNAME: 8a9053f3c7ce
SHELL: ?
PWD: /work
PATH: /opt/conda/envs/amrtest/bin:/opt/conda/condabin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
Progam name:  amrfinder
Command line: amrfinder -n Klebsiella+oxytoca.fna --threads 4 --organism Klebsiella_oxytoca

Interestingly, when running with --threads 1, this doesn't happen:

(amrtest) root@8a9053f3c7ce:/work# amrfinder -n Klebsiella+oxytoca.fna --threads 1 --organism Klebsiella_oxytoca
Running: amrfinder -n Klebsiella+oxytoca.fna --threads 1 --organism Klebsiella_oxytoca
Software directory: '/opt/conda/envs/amrtest/bin/'
Software version: 3.12.8
Database directory: '/opt/conda/envs/amrtest/share/amrfinderplus/data/2024-05-02.2'
Database version: 2024-05-02.2
AMRFinder translated nucleotide and mutation search
Running blastx
Running blastn
Making report
Protein identifier      Contig id       Start   Stop    Strand  Gene symbol     Sequence name   Scope   Element type    Element subtype Class   Subclass        Method  Target length    Reference sequence length       % Coverage of reference sequence        % Identity to reference sequence        Alignment length        Accession of closest sequence    Name of closest sequence        HMM id  HMM description
NA      k141_2730       1604    2473    -       blaOXY-2-6      extended-spectrum class A beta-lactamase OXY-2-6        core    AMR     AMR     BETA-LACTAM     CEPHALOSPORIN    ALLELEX 290     290     100.00  100.00  290     WP_063864552.1  extended-spectrum class A beta-lactamase OXY-2-6        NA      NA
AMRFinder took 215 seconds to complete

I can share the fasta causing this issue via email if helpful. I'm not sure if the issue is with blastn itself, or the way the inputs are structured.

To set up the environment, I installed amrfinder using docker/micromamba:

[josh@i-072f4817696381e6f debugamr]$ docker run -it --rm -u 0 -v $(pwd):/work -w /work mambaorg/micromamba:1.5.8 bash
(base) root@8a9053f3c7ce:/work# micromamba create -y -n amrtest ncbi-amrfinderplus==3.12.8 -c bioconda -c conda-forge -c defaults
(base) root@8a9053f3c7ce:/work# micromamba activate amrtest
(amrtest) root@8a9053f3c7ce:/work# amrfinder -u
evolarjun commented 4 months ago

Hi, you're the second person in the last few days to have issues with core-dumps from tools used by AMRFinderPlus. The other one was with HMMER, but I am guessing there's a problem of some sort with the version of blast installed by conda in your case. Since you're already running things in docker, maybe you could try our docker container instead? See https://github.com/ncbi/amr/wiki/Installing-AMRFinder#docker for a pointer to the dockerhub repo.

I will try to reproduce your issue and get back to you if I figure anything out.

vbrover commented 4 months ago

Is the 'ncbi::CCoreException' reproducible if you run just blastn?

'/opt/conda/envs/amrtest/bin/blastn'  -query 'Klebsiella+oxytoca.fna' -db /tmp/amrfinder.SVXjTS/db/AMR_DNA-Klebsiella_oxytoca -evalue 1e-20  -dust no  -max_target_seqs 10000    -num_threads 2  -mt_mode 1 -outfmt '6 qseqid sseqid qstart qend qlen sstart send slen qseq sseq' -out /tmp/amrfinder.SVXjTS/blastn 
jchorl commented 4 months ago

Indeed it is reproducible with just running blastn, and it does appear to be restricted to a specific version.

Not working:

(amrtest) root@c8bbdaba3448:/work# blastn -version
blastn: 2.15.0+
 Package: blast 2.15.0, build Nov 10 2023 17:55:33

Working (from ncbi/amr) image:

root@9779272b65e8:/work# blastn -version
blastn: 2.12.0+
 Package: blast 2.12.0, build Mar  8 2022 16:19:08

I then went ahead and tried to pin blast with conda. I started with the version right before current, 2.14.1.

(amrtest) root@c8bbdaba3448:/work# micromamba create -y -n amrtest2 ncbi-amrfinderplus==3.12.8 blast==2.14.1 -c bioconda -c conda-forge -c defaults
(amrtest) root@c8bbdaba3448:/work# micromamba activate amrtest2
(amrtest2) root@c8bbdaba3448:/work# amrfinder -u
(amrtest2) root@c8bbdaba3448:/work# amrfinder -n Klebsiella+oxytoca.fna --threads 4 --organism Klebsiella_oxytoca
Running: amrfinder -n Klebsiella+oxytoca.fna --threads 4 --organism Klebsiella_oxytoca
Software directory: '/opt/conda/envs/amrtest2/bin/'
Software version: 3.12.8
Database directory: '/opt/conda/envs/amrtest2/share/amrfinderplus/data/2024-05-02.2'
Database version: 2024-05-02.2
AMRFinder translated nucleotide and mutation search
Running blastx
Running blastn
Making report
Protein identifier      Contig id       Start   Stop    Strand  Gene symbol     Sequence name   Scope   Element type    Element subtype        Class   Subclass        Method  Target length   Reference sequence length       % Coverage of reference sequence       % Identity to reference sequence        Alignment length        Accession of closest sequence   Name of closest sequence       HMM id  HMM description
NA      k141_2730       1604    2473    -       blaOXY-2-6      extended-spectrum class A beta-lactamase OXY-2-6        core  AMR      AMR     BETA-LACTAM     CEPHALOSPORIN   ALLELEX 290     290     100.00  100.00  290     WP_063864552.1  extended-spectrum class A beta-lactamase OXY-2-6       NA      NA
AMRFinder took 59 seconds to complete

So I guess it is a versioning issue, and can be solved by pinning to the old version.

I'll capture the inputs to blastn and see if I can contact those folks to resolve. If you have a good way of reaching them/contacts there, that would be helpful.

evolarjun commented 4 months ago

blast issues should be reported to blast-help@ncbi.nlm.nih.gov.

I ran your mamba installation commands in a docker container and got blast version 2.15.0 which didn't appear to have the issue when run with a random FASTA file I happened to have laying around.

docker run -it --rm -u 0 -v $(pwd):/work -w /work mambaorg/micromamba:1.5.8 bash
micromamba create -y -n amrtest ncbi-amrfinderplus==3.12.8 -c bioconda -c conda-forge -c defaults
micromamba activate amrtest
amrfinder -u

And the installation picked up BLAST+ version 2.15.0

(amrtest) root@8b647a489550:/work# blastn -version
blastn: 2.15.0+
 Package: blast 2.15.0, build Nov 10 2023 17:55:33

It is possible the problem you're having is a blast bug and related to https://github.com/ncbi/amr/issues/118 possibly related to the format of the FASTA header line. The issue seemed to be fixed with a newer version of blast in that case.

Conda versioning is a bit of magic to me (I don't always understand why it picks one version vs. another). BLAST+ version 2.15.0 may or may not have the issue. I wasn't able to reproduce it, but possibly with your input files I would?