Open schorlton-bugseq opened 2 weeks ago
This command broke:
'/usr/local/bin/blastn' -query 'GCF_000240185.1_ASM24018v2_genomic.fna' -db /tmp/amrfinder.fiUvgO/db/AMR_DNA-Klebsiella_pneumoniae.fa -evalue 1e-20 -dust no -max_target_seqs 10000 -num_threads 2 -mt_mode 1 -outfmt '6 qseqid sseqid qstart qend qlen sstart send slen qseq sseq' -out /tmp/amrfinder.fiUvgO/blastn > /tmp/amrfinder.fiUvgO/log 2> /tmp/amrfinder.fiUvgO/blastn-err
What is the result of these commands?
/usr/local/bin/blastn
ls -laF GCF_000240185.1_ASM24018v2_genomic.fna
ls -laF /tmp/amrfinder.fiUvgO/db/AMR_DNA-Klebsiella_pneumoniae.fa*
cat /tmp/amrfinder.fiUvgO/blastn
cat /tmp/amrfinder.fiUvgO/log
cat /tmp/amrfinder.fiUvgO/blastn-err
Is there enough space in /tmp/
?
Two additional comments on what you're running, though I'm not yet sure what is the issue with the biocontainer.
This can be used like:
docker run --rm -v ${PWD}:/data ncbi/amr \
amrfinder -n GCF_000240185.1_ASM24018v2_genomic.fna.gz -O Klebsiella_pneumoniae --plus --threads 8 > amrfinder.out
Thanks @evolarjun and @vbrover. To address your comments and questions:
We always include the latest database in our containers, so you wouldn't have to download the database separately. See https://hub.docker.com/r/ncbi/amr and for the image build scripts, https://github.com/ncbi/docker/tree/master/amr
I want to maintain version control of the database so I download it and store it
What is the result of these commands?
sh-5.2# /usr/local/bin/blastn
BLAST query/options error: Either a BLAST database or subject sequence(s) must be specified
Please refer to the BLAST+ user manual.
ls -laF GCF_000240185.1_ASM24018v2_genomic.fna
-rw-r--r-- 1 root root 5754012 Nov 7 16:09 GCF_000240185.1_ASM24018v2_genomic.fna
sh-5.2# ls -laF /tmp/amrfinder.fiUvgO/db/AMR_DNA-Klebsiella_pneumoniae.fa*
-rw-r--r-- 1 root root 365 Nov 7 16:08 /tmp/amrfinder.fiUvgO/db/AMR_DNA-Klebsiella_pneumoniae.fa
-rw-r--r-- 1 root root 20480 Nov 7 16:08 /tmp/amrfinder.fiUvgO/db/AMR_DNA-Klebsiella_pneumoniae.fa.ndb
-rw-r--r-- 1 root root 122 Nov 7 16:08 /tmp/amrfinder.fiUvgO/db/AMR_DNA-Klebsiella_pneumoniae.fa.nhr
-rw-r--r-- 1 root root 184 Nov 7 16:08 /tmp/amrfinder.fiUvgO/db/AMR_DNA-Klebsiella_pneumoniae.fa.nin
-rw-r--r-- 1 root root 697 Nov 7 16:08 /tmp/amrfinder.fiUvgO/db/AMR_DNA-Klebsiella_pneumoniae.fa.njs
-rw-r--r-- 1 root root 20 Nov 7 16:08 /tmp/amrfinder.fiUvgO/db/AMR_DNA-Klebsiella_pneumoniae.fa.not
-rw-r--r-- 1 root root 77 Nov 7 16:08 /tmp/amrfinder.fiUvgO/db/AMR_DNA-Klebsiella_pneumoniae.fa.nsq
-rw-r--r-- 1 root root 16384 Nov 7 16:08 /tmp/amrfinder.fiUvgO/db/AMR_DNA-Klebsiella_pneumoniae.fa.ntf
-rw-r--r-- 1 root root 8 Nov 7 16:08 /tmp/amrfinder.fiUvgO/db/AMR_DNA-Klebsiella_pneumoniae.fa.nto
sh-5.2# cat /tmp/amrfinder.fiUvgO/blastn
NC_016845.1 NZ_CP054063.1@blaSHV_promoter_region@blaSHV:2645215-2645514 2550162 2550461 5333942 300 1 300 GCTTTCGCTTTGTTTAATTTGCTCAAGCGGCTGCGGGCTGGCGTGTACCGCCAGCGGCAGGGTGGCTAACAGGGAGATAATACACAGGCGAATATAACGCATAACCACAATACATCCTTGAGTGAGGGCCGATAAAGGCGAGTAAAGAAGCGACAAATAAGAATAACCCGGCGTTTTGCTGATTCACAATTCCTCTTTTTTCCTTCATCATTTGTCATCTTTTATTTCGAATAATCAATATCTAGCCCTGCCTAAGCACGCTATTTTTTGACTCAAGGCCGTGATGAACTATAAGAAAGT GCTTTCGCTTAGTTTAATTTGCTCAAGCGGCTGCGGGCTGGCGTGTACCGCCAGCGGCAGGGTGGCTAACAGGGAGATAATACACAGGCGAATATAACGCATAACCACAATACATCCTTGAGTGAGGGCCGATAAAGGCGAGTAAAGAAGCGACAAATAAGAATAACCCGGCGTTTTGCTGATTCACAATTCCTCTTTTTTCCTTCATCATTTGTCATCTTTTATTTCGAATAATCAATATCTAGCCCTGCCTAAGCACGCTATTTTTTGACTCAAGGCCGTGATGAACTATAAGAAAGT
sh-5.2# cat /tmp/amrfinder.fiUvgO/log
sh-5.2# cat /tmp/amrfinder.fiUvgO/blastn-err
terminate called after throwing an instance of 'ncbi::CCoreException'
what(): NCBI C++ Exception:
T1 "/opt/conda/conda-bld/blast_1722950180657/work/c++/src/corelib/ncbiobj.cpp", line 1010: Critical: (CCoreException::eNullPtr) ncbi::CObject::ThrowNullPointerException() - Attempt to access NULL pointer.
Stack trace:
/usr/local/bin/../lib/ncbi-blast+/libxncbi.so ???:0 ncbi::CObject::ThrowNullPointerException() offset=0xAA addr=0x7e15f05ec24a
/usr/local/bin/../lib/ncbi-blast+/libxblast.so ???:0 ncbi::blast::CBlastNode::~CBlastNode() offset=0x355 addr=0x7e15f22dafe5
/usr/local/bin/blastn ???:0 ncbi::blast::CBlastnNode::~CBlastnNode() offset=0xA addr=0x5ec619d616aa
/usr/local/bin/../lib/ncbi-blast+/libxncbi.so ???:0 ncbi::CThread::Wrapper(void*) offset=0x1B6 addr=0x7e15f0623486
/lib/x86_64-linux-gnu/libc.so.6 ???:0 offset=0x89134 addr=0x7e15f0227134
/lib/x86_64-linux-gnu/libc.so.6 ???:0 __clone offset=0x40 addr=0x7e15f02a6a40
I provided what I think is a very straight forward way to reproduce if you want to inspect further.
What if you run this?
Same issue as above, I want to store the database for versioning in a custom directory. I suppose I could copy it out of there if that approach works.
sh-5.2# amrfinder -u -d uflag
Running: amrfinder -u -d uflag
Software directory: '/usr/local/bin/'
Software version: 4.0.3
*** ERROR ***
AMRFinder update option (-u/--update) only operates on the default database directory. The -d/--database option is not permitted
For what it's worth, I'm 95% sure the above is reproducible installing AMRFinderPlus via conda in a micromamba docker container but haven't gone through the exact steps above.
Thanks again!
And what is BLAST version?
/usr/local/bin/blastn -version
Hi @vbrover, I appreciate your help - I suspect it will be a lot faster/less frustrating for you to debug with the example provided!
docker run --rm -it quay.io/biocontainers/ncbi-amrfinderplus:4.0.3--h283d18e_0 /usr/local/bin/blastn -version
blastn: 2.16.0+
Package: blast 2.16.0, build Aug 6 2024 13:23:48
And is this error reproducible if you run this?
'/usr/local/bin/blastn' -query 'GCF_000240185.1_ASM24018v2_genomic.fna' -db /tmp/amrfinder.fiUvgO/db/AMR_DNA-Klebsiella_pneumoniae.fa -evalue 1e-20 -dust no -max_target_seqs 10000 -num_threads 2 -mt_mode 1 -outfmt '6 qseqid sseqid qstart qend qlen sstart send slen qseq sseq'
And is this error reproducible if you run this?
'/usr/local/bin/blastn' -query 'GCF_000240185.1_ASM24018v2_genomic.fna' -db /tmp/amrfinder.fiUvgO/db/AMR_DNA-Klebsiella_pneumoniae.fa -evalue 1e-20 -dust no -max_target_seqs 10000 -num_threads 2 -mt_mode 1 -outfmt '6 qseqid sseqid qstart qend qlen sstart send slen qseq sseq'
Sorry I can't help further at present unless you can't reproduce as provided in the containerized, detailed description. I'll lastly mention that running the same command 4 times, it passed 1/4 times so try again if you don't get it the first time.
How many threads are allowed on your computer?
cat /proc/cpuinfo | grep -cw processor
Can you run the test provided with installation?
test_amrfinder_update.sh
How many threads are allowed on your computer?
cat /proc/cpuinfo | grep -cw processor
Can you run the test provided with installation?
test_amrfinder.sh
sh-5.2# cat /proc/cpuinfo | grep -cw processor
16
sh-5.2# test_amrfinder.sh
sh: test_amrfinder.sh: command not found
sh-5.2# find / -name test_amrfinder.sh
It is a bug in docker distribution. We will fix it.
Hi ,
I'm sorry for the delayed response on my part. I did try running your commands. Unfortunately I had trouble around security issues running exactly what you sent. Updating a database within the container seems to work fine.
In my tests accessing the database in a directory outside of the container led to the null pointer error that you saw. I don't yet understand how/why that would be any different from using the same database within the container.
I understand that you want to be able to control versions of the database, and I agree that you should do so. I again suggest that the best way to do that would be to use docker images that include the database. The images we provide are tagged with both software and database versions. E.g., 4.0.3-2024-10-22.1. You can see a list of them at https://hub.docker.com/r/ncbi/amr/tags. We build and release a new image with every release of software or database.
Personally I think that a docker image that does not also contain the database is the wrong way to control versions. The image you're trying to use is automatically created by the company quay.io from the conda packages we help maintain. I will put some effort into figuring out what's wrong when you try to use their images in the way you are trying to, but fundamentally those images are generated by an commercial automated system from a package that wasn't designed to be used that way and that we don't have any control or influence over.
If for some reason you don't like the images we provide, you could generate your own using or modifying our Dockerfile or use the images generated by StaPH-B which also now contain both database and software in the same image tagged with versions of both. See https://github.com/StaPH-B/docker-builds/tree/master/ncbi-amrfinderplus/4.0.3-2024-10-22.1 for an example StaPH-B Dockerfile.
Arjun
Thank you again for investigating and I am glad you were able to reproduce.
Personally I think that a docker image that does not also contain the database is the wrong way to control versions. The image you're trying to use is automatically created by the company quay.io from the conda packages we help maintain. I will put some effort into figuring out what's wrong when you try to use their images in the way you are trying to, but fundamentally those images are generated by an commercial automated system from a package that wasn't designed to be used that way and that we don't have any control or influence over.
This is not specific to Quay or Biocontainer. I provided those because they're easy, reproducible and popular within the bioinformatics community, to help you reproduce.
From my message above:
For what it's worth, I'm 95% sure the above is reproducible installing AMRFinderPlus via conda in a micromamba docker container but haven't gone through the exact steps above.
I have now confirmed this, and conda is the first and recommended way to install on your Wiki! If you don't believe users should be installing conda packages within docker containers, then that is going to face a lot of resistance from the the users of continuumio/miniconda3 (>10M Docker Hub pulls), mambaorg/micromamba (>1M pulls) and other images...
I apologize for the tone of my last message. I do believe that a big part of the power of docker containers is the reproducibility ensured by freezing all dependencies inside an image. I'm all for installing AMRFinderPlus via conda in images; that seems to work ok when the database is inside the image.
Here I'm trying to help with a bug in an image I didn't create in software I don't control which makes it more complicated. I wish it worked the way you expected. It should. AMRFinderPlus relies on HMMER, BLAST+, libcurl, gzip, etc. If one of those packages has a bug there's not a lot I can do except report the bug somewhere and hope someone has the good will to fix it. I don't have the time to diagnose and fix everyone else's code myself.
Now since the bug appears to be in a conda installation of blast inside a docker container, likely a heavily used combination, there should be someone out there who can fix your bug. As soon as I get time I will try to put together a simple reproducible example and report it to the blast people as a first step. I don't know where the bug comes from because I haven't yet reproduced it except in the specific case of accessing a database outside of a container from the quay.io docker container for the ncbi-amrfinderplus conda package. I haven't yet had time to try making images in different ways and seeing if I still see the bug. As I can tell you appreciate, making a good bug report takes some effort.
As your pipeline relies on other people's software, so does ours. I'm sorry it's not all working the way you wanted it to, and there's nothing I can do directly to make the bug you found go away. We will work on it, but I can't make any promises for a timeline. There are a lot of things you could try such as installing an older version of blast or creating an image that doesn't use the conda installs. For someone with little technical knowledge AMRFinderPlus is pretty easy to install from binaries or source. In addition to the documentation you can check out our github actions to test installation for very simple examples. We recommend conda primarily because most users are not that technically savvy about installing the dependencies.
Thanks for your ongoing work on this tool! Hitting an odd issue:
Yields: