ncbi / amr

AMRFinderPlus - Identify AMR genes and point mutations, and virulence and stress resistance genes in assembled bacterial nucleotide and protein sequence.
https://www.ncbi.nlm.nih.gov/pathogens/antimicrobial-resistance/AMRFinder/
Other
276 stars 39 forks source link

terminate called after throwing an instance of 'ncbi::CCoreException' #155

Open schorlton-bugseq opened 2 weeks ago

schorlton-bugseq commented 2 weeks ago

Thanks for your ongoing work on this tool! Hitting an odd issue:

docker run --rm -it -v $(pwd):$(pwd) -w $(pwd) quay.io/biocontainers/ncbi-amrfinderplus:4.0.3--h283d18e_0 

mkdir db_download

amrfinder_update -d db_download/

wget https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/240/185/GCF_000240185.1_ASM24018v2/GCF_000240185.1_ASM24018v2_genomic.fna.gz

gunzip GCF_000240185.1_ASM24018v2_genomic.fna.gz

amrfinder -n GCF_000240185.1_ASM24018v2_genomic.fna -O Klebsiella_pneumoniae -d db_download/2024-10-22.1/

Yields:

Running: amrfinder -n GCF_000240185.1_ASM24018v2_genomic.fna -O Klebsiella_pneumoniae -d db_download/2024-10-22.1/
Software directory: '/usr/local/bin/'
Software version: 4.0.3
Database directory: '/reproducible_example/db_download/2024-10-22.1'
Database version: 2024-10-22.1
AMRFinder translated nucleotide and mutation search
Running tblastn
Running blastn
*** ERROR ***
'/usr/local/bin/blastn'  -query 'GCF_000240185.1_ASM24018v2_genomic.fna' -db /tmp/amrfinder.fiUvgO/db/AMR_DNA-Klebsiella_pneumoniae.fa  -evalue 1e-20  -dust no  -max_target_seqs 10000    -num_threads 2  -mt_mode 1 -outfmt '6 qseqid sseqid qstart qend qlen sstart send slen qseq sseq' -out /tmp/amrfinder.fiUvgO/blastn > /tmp/amrfinder.fiUvgO/log 2> /tmp/amrfinder.fiUvgO/blastn-err
status = 34304
terminate called after throwing an instance of 'ncbi::CCoreException'
  what():  NCBI C++ Exception:
    T1 "/opt/conda/conda-bld/blast_1722950180657/work/c++/src/corelib/ncbiobj.cpp", line 1010: Critical: (CCoreException::eNullPtr) ncbi::CObject::ThrowNullPointerException() - Attempt to access NULL pointer.
     Stack trace:
      /usr/local/bin/../lib/ncbi-blast+/libxncbi.so ???:0 ncbi::CObject::ThrowNullPointerException() offset=0xAA addr=0x7e15f05ec24a
      /usr/local/bin/../lib/ncbi-blast+/libxblast.so ???:0 ncbi::blast::CBlastNode::~CBlastNode() offset=0x355 addr=0x7e15f22dafe5
      /usr/local/bin/blastn ???:0 ncbi::blast::CBlastnNode::~CBlastnNode() offset=0xA addr=0x5ec619d616aa
      /usr/local/bin/../lib/ncbi-blast+/libxncbi.so ???:0 ncbi::CThread::Wrapper(void*) offset=0x1B6 addr=0x7e15f0623486
      /lib/x86_64-linux-gnu/libc.so.6 ???:0  offset=0x89134 addr=0x7e15f0227134
      /lib/x86_64-linux-gnu/libc.so.6 ???:0 __clone offset=0x40 addr=0x7e15f02a6a40
vbrover commented 2 weeks ago

This command broke:

'/usr/local/bin/blastn'  -query 'GCF_000240185.1_ASM24018v2_genomic.fna' -db /tmp/amrfinder.fiUvgO/db/AMR_DNA-Klebsiella_pneumoniae.fa  -evalue 1e-20  -dust no  -max_target_seqs 10000    -num_threads 2  -mt_mode 1 -outfmt '6 qseqid sseqid qstart qend qlen sstart send slen qseq sseq' -out /tmp/amrfinder.fiUvgO/blastn > /tmp/amrfinder.fiUvgO/log 2> /tmp/amrfinder.fiUvgO/blastn-err

What is the result of these commands?

/usr/local/bin/blastn
ls -laF GCF_000240185.1_ASM24018v2_genomic.fna
ls -laF /tmp/amrfinder.fiUvgO/db/AMR_DNA-Klebsiella_pneumoniae.fa*
cat /tmp/amrfinder.fiUvgO/blastn 
cat /tmp/amrfinder.fiUvgO/log
cat /tmp/amrfinder.fiUvgO/blastn-err
vbrover commented 2 weeks ago

Is there enough space in /tmp/?

evolarjun commented 2 weeks ago

Two additional comments on what you're running, though I'm not yet sure what is the issue with the biocontainer.

  1. We always include the latest database in our containers, so you wouldn't have to download the database separately. See https://hub.docker.com/r/ncbi/amr and for the image build scripts, https://github.com/ncbi/docker/tree/master/amr

This can be used like:

docker run --rm -v ${PWD}:/data ncbi/amr \
    amrfinder -n GCF_000240185.1_ASM24018v2_genomic.fna.gz -O Klebsiella_pneumoniae --plus --threads 8 > amrfinder.out
  1. AMRFinderPlus will automatically unzip files ending in .gz, so you don't need that step.
schorlton-bugseq commented 2 weeks ago

Thanks @evolarjun and @vbrover. To address your comments and questions:

We always include the latest database in our containers, so you wouldn't have to download the database separately. See https://hub.docker.com/r/ncbi/amr and for the image build scripts, https://github.com/ncbi/docker/tree/master/amr

I want to maintain version control of the database so I download it and store it

What is the result of these commands?

sh-5.2# /usr/local/bin/blastn
BLAST query/options error: Either a BLAST database or subject sequence(s) must be specified
Please refer to the BLAST+ user manual.

ls -laF GCF_000240185.1_ASM24018v2_genomic.fna
-rw-r--r--    1 root     root       5754012 Nov  7 16:09 GCF_000240185.1_ASM24018v2_genomic.fna

sh-5.2# ls -laF /tmp/amrfinder.fiUvgO/db/AMR_DNA-Klebsiella_pneumoniae.fa*
-rw-r--r--    1 root     root           365 Nov  7 16:08 /tmp/amrfinder.fiUvgO/db/AMR_DNA-Klebsiella_pneumoniae.fa
-rw-r--r--    1 root     root         20480 Nov  7 16:08 /tmp/amrfinder.fiUvgO/db/AMR_DNA-Klebsiella_pneumoniae.fa.ndb
-rw-r--r--    1 root     root           122 Nov  7 16:08 /tmp/amrfinder.fiUvgO/db/AMR_DNA-Klebsiella_pneumoniae.fa.nhr
-rw-r--r--    1 root     root           184 Nov  7 16:08 /tmp/amrfinder.fiUvgO/db/AMR_DNA-Klebsiella_pneumoniae.fa.nin
-rw-r--r--    1 root     root           697 Nov  7 16:08 /tmp/amrfinder.fiUvgO/db/AMR_DNA-Klebsiella_pneumoniae.fa.njs
-rw-r--r--    1 root     root            20 Nov  7 16:08 /tmp/amrfinder.fiUvgO/db/AMR_DNA-Klebsiella_pneumoniae.fa.not
-rw-r--r--    1 root     root            77 Nov  7 16:08 /tmp/amrfinder.fiUvgO/db/AMR_DNA-Klebsiella_pneumoniae.fa.nsq
-rw-r--r--    1 root     root         16384 Nov  7 16:08 /tmp/amrfinder.fiUvgO/db/AMR_DNA-Klebsiella_pneumoniae.fa.ntf
-rw-r--r--    1 root     root             8 Nov  7 16:08 /tmp/amrfinder.fiUvgO/db/AMR_DNA-Klebsiella_pneumoniae.fa.nto

sh-5.2# cat /tmp/amrfinder.fiUvgO/blastn 
NC_016845.1 NZ_CP054063.1@blaSHV_promoter_region@blaSHV:2645215-2645514 2550162 2550461 5333942 300 1   300 GCTTTCGCTTTGTTTAATTTGCTCAAGCGGCTGCGGGCTGGCGTGTACCGCCAGCGGCAGGGTGGCTAACAGGGAGATAATACACAGGCGAATATAACGCATAACCACAATACATCCTTGAGTGAGGGCCGATAAAGGCGAGTAAAGAAGCGACAAATAAGAATAACCCGGCGTTTTGCTGATTCACAATTCCTCTTTTTTCCTTCATCATTTGTCATCTTTTATTTCGAATAATCAATATCTAGCCCTGCCTAAGCACGCTATTTTTTGACTCAAGGCCGTGATGAACTATAAGAAAGT    GCTTTCGCTTAGTTTAATTTGCTCAAGCGGCTGCGGGCTGGCGTGTACCGCCAGCGGCAGGGTGGCTAACAGGGAGATAATACACAGGCGAATATAACGCATAACCACAATACATCCTTGAGTGAGGGCCGATAAAGGCGAGTAAAGAAGCGACAAATAAGAATAACCCGGCGTTTTGCTGATTCACAATTCCTCTTTTTTCCTTCATCATTTGTCATCTTTTATTTCGAATAATCAATATCTAGCCCTGCCTAAGCACGCTATTTTTTGACTCAAGGCCGTGATGAACTATAAGAAAGT

sh-5.2# cat /tmp/amrfinder.fiUvgO/log

sh-5.2# cat /tmp/amrfinder.fiUvgO/blastn-err
terminate called after throwing an instance of 'ncbi::CCoreException'
  what():  NCBI C++ Exception:
    T1 "/opt/conda/conda-bld/blast_1722950180657/work/c++/src/corelib/ncbiobj.cpp", line 1010: Critical: (CCoreException::eNullPtr) ncbi::CObject::ThrowNullPointerException() - Attempt to access NULL pointer.
     Stack trace:
      /usr/local/bin/../lib/ncbi-blast+/libxncbi.so ???:0 ncbi::CObject::ThrowNullPointerException() offset=0xAA addr=0x7e15f05ec24a
      /usr/local/bin/../lib/ncbi-blast+/libxblast.so ???:0 ncbi::blast::CBlastNode::~CBlastNode() offset=0x355 addr=0x7e15f22dafe5
      /usr/local/bin/blastn ???:0 ncbi::blast::CBlastnNode::~CBlastnNode() offset=0xA addr=0x5ec619d616aa
      /usr/local/bin/../lib/ncbi-blast+/libxncbi.so ???:0 ncbi::CThread::Wrapper(void*) offset=0x1B6 addr=0x7e15f0623486
      /lib/x86_64-linux-gnu/libc.so.6 ???:0  offset=0x89134 addr=0x7e15f0227134
      /lib/x86_64-linux-gnu/libc.so.6 ???:0 __clone offset=0x40 addr=0x7e15f02a6a40

I provided what I think is a very straight forward way to reproduce if you want to inspect further.

What if you run this?

Same issue as above, I want to store the database for versioning in a custom directory. I suppose I could copy it out of there if that approach works.

sh-5.2# amrfinder -u -d uflag
Running: amrfinder -u -d uflag
Software directory: '/usr/local/bin/'
Software version: 4.0.3
*** ERROR ***
AMRFinder update option (-u/--update) only operates on the default database directory. The -d/--database option is not permitted

For what it's worth, I'm 95% sure the above is reproducible installing AMRFinderPlus via conda in a micromamba docker container but haven't gone through the exact steps above.

Thanks again!

vbrover commented 2 weeks ago

And what is BLAST version?

/usr/local/bin/blastn -version
schorlton-bugseq commented 2 weeks ago

Hi @vbrover, I appreciate your help - I suspect it will be a lot faster/less frustrating for you to debug with the example provided!

docker run --rm -it quay.io/biocontainers/ncbi-amrfinderplus:4.0.3--h283d18e_0 /usr/local/bin/blastn -version
blastn: 2.16.0+
 Package: blast 2.16.0, build Aug  6 2024 13:23:48
vbrover commented 2 weeks ago

And is this error reproducible if you run this?

'/usr/local/bin/blastn'  -query 'GCF_000240185.1_ASM24018v2_genomic.fna' -db /tmp/amrfinder.fiUvgO/db/AMR_DNA-Klebsiella_pneumoniae.fa  -evalue 1e-20  -dust no  -max_target_seqs 10000  -num_threads 2  -mt_mode 1  -outfmt '6 qseqid sseqid qstart qend qlen sstart send slen qseq sseq' 
schorlton-bugseq commented 2 weeks ago

And is this error reproducible if you run this?

'/usr/local/bin/blastn'  -query 'GCF_000240185.1_ASM24018v2_genomic.fna' -db /tmp/amrfinder.fiUvgO/db/AMR_DNA-Klebsiella_pneumoniae.fa  -evalue 1e-20  -dust no  -max_target_seqs 10000  -num_threads 2  -mt_mode 1  -outfmt '6 qseqid sseqid qstart qend qlen sstart send slen qseq sseq' 

Sorry I can't help further at present unless you can't reproduce as provided in the containerized, detailed description. I'll lastly mention that running the same command 4 times, it passed 1/4 times so try again if you don't get it the first time.

vbrover commented 2 weeks ago

How many threads are allowed on your computer?

cat /proc/cpuinfo | grep -cw processor

Can you run the test provided with installation?

test_amrfinder_update.sh
schorlton-bugseq commented 2 weeks ago

How many threads are allowed on your computer?

cat /proc/cpuinfo | grep -cw processor

Can you run the test provided with installation?

test_amrfinder.sh
sh-5.2# cat /proc/cpuinfo | grep -cw processor
16
sh-5.2# test_amrfinder.sh
sh: test_amrfinder.sh: command not found
sh-5.2# find / -name test_amrfinder.sh
vbrover commented 2 weeks ago

It is a bug in docker distribution. We will fix it.

evolarjun commented 2 weeks ago

Hi ,

I'm sorry for the delayed response on my part. I did try running your commands. Unfortunately I had trouble around security issues running exactly what you sent. Updating a database within the container seems to work fine.

In my tests accessing the database in a directory outside of the container led to the null pointer error that you saw. I don't yet understand how/why that would be any different from using the same database within the container.

I understand that you want to be able to control versions of the database, and I agree that you should do so. I again suggest that the best way to do that would be to use docker images that include the database. The images we provide are tagged with both software and database versions. E.g., 4.0.3-2024-10-22.1. You can see a list of them at https://hub.docker.com/r/ncbi/amr/tags. We build and release a new image with every release of software or database.

Personally I think that a docker image that does not also contain the database is the wrong way to control versions. The image you're trying to use is automatically created by the company quay.io from the conda packages we help maintain. I will put some effort into figuring out what's wrong when you try to use their images in the way you are trying to, but fundamentally those images are generated by an commercial automated system from a package that wasn't designed to be used that way and that we don't have any control or influence over.

If for some reason you don't like the images we provide, you could generate your own using or modifying our Dockerfile or use the images generated by StaPH-B which also now contain both database and software in the same image tagged with versions of both. See https://github.com/StaPH-B/docker-builds/tree/master/ncbi-amrfinderplus/4.0.3-2024-10-22.1 for an example StaPH-B Dockerfile.

Arjun

schorlton-bugseq commented 2 weeks ago

Thank you again for investigating and I am glad you were able to reproduce.

Personally I think that a docker image that does not also contain the database is the wrong way to control versions. The image you're trying to use is automatically created by the company quay.io from the conda packages we help maintain. I will put some effort into figuring out what's wrong when you try to use their images in the way you are trying to, but fundamentally those images are generated by an commercial automated system from a package that wasn't designed to be used that way and that we don't have any control or influence over.

This is not specific to Quay or Biocontainer. I provided those because they're easy, reproducible and popular within the bioinformatics community, to help you reproduce.

From my message above:

For what it's worth, I'm 95% sure the above is reproducible installing AMRFinderPlus via conda in a micromamba docker container but haven't gone through the exact steps above.

I have now confirmed this, and conda is the first and recommended way to install on your Wiki! If you don't believe users should be installing conda packages within docker containers, then that is going to face a lot of resistance from the the users of continuumio/miniconda3 (>10M Docker Hub pulls), mambaorg/micromamba (>1M pulls) and other images...

evolarjun commented 1 week ago

I apologize for the tone of my last message. I do believe that a big part of the power of docker containers is the reproducibility ensured by freezing all dependencies inside an image. I'm all for installing AMRFinderPlus via conda in images; that seems to work ok when the database is inside the image.

Here I'm trying to help with a bug in an image I didn't create in software I don't control which makes it more complicated. I wish it worked the way you expected. It should. AMRFinderPlus relies on HMMER, BLAST+, libcurl, gzip, etc. If one of those packages has a bug there's not a lot I can do except report the bug somewhere and hope someone has the good will to fix it. I don't have the time to diagnose and fix everyone else's code myself.

Now since the bug appears to be in a conda installation of blast inside a docker container, likely a heavily used combination, there should be someone out there who can fix your bug. As soon as I get time I will try to put together a simple reproducible example and report it to the blast people as a first step. I don't know where the bug comes from because I haven't yet reproduced it except in the specific case of accessing a database outside of a container from the quay.io docker container for the ncbi-amrfinderplus conda package. I haven't yet had time to try making images in different ways and seeing if I still see the bug. As I can tell you appreciate, making a good bug report takes some effort.

As your pipeline relies on other people's software, so does ours. I'm sorry it's not all working the way you wanted it to, and there's nothing I can do directly to make the bug you found go away. We will work on it, but I can't make any promises for a timeline. There are a lot of things you could try such as installing an older version of blast or creating an image that doesn't use the conda installs. For someone with little technical knowledge AMRFinderPlus is pretty easy to install from binaries or source. In addition to the documentation you can check out our github actions to test installation for very simple examples. We recommend conda primarily because most users are not that technically savvy about installing the dependencies.