Closed kapsakcj closed 1 year ago
Quick answer is that all databases are stored at https://ftp.ncbi.nlm.nih.gov/pathogen/Antimicrobial_resistance/AMRFinderPlus/database/, so that you can choose any historic database.
But you need to run makeblastdb
and hmmpress
and choose the compatible historic software because the latest one may not work with an old database.
The latest amrfinder
(version 3.11.4) has the program amrfinder_index
which runs makeblastdb
and hmmpress
on a given database.
We haven't actually released AMRFinderPlus 3.11.4 which includes amrfinder_index
, though I should be able to get that done in the next day or two. One simple way to run old versions at least since 3.10.14-2021-08-11.1 is to use the docker containers we're now producing. They freeze a given version of the software and database together. See https://hub.docker.com/r/ncbi/amr/tags?page=1
Hello again Curtis,
I (mostly) finished the release of AMRFinderPlus version 3.11.4 including amrfinder_index
. To download an older version of the database you can locate the directory at https://ftp.ncbi.nlm.nih.gov/pathogen/Antimicrobial_resistance/AMRFinderPlus/database/ download all the files, and run amrfinder_index
on the directory.
For example:
wget ftp://ftp.ncbi.nlm.nih.gov/pathogen/Antimicrobial_resistance/AMRFinderPlus/database/3.11/2022-12-19.1/*
amrfinder_update .
You can use the directory name on the FTP site for the major.minor versions of the software compatible with that database version or the major.minor version numbers in the database_format_version.txt
file contained within the database.
Let us know if you can't get this to work, but to be honest, I recommend using the docker container method described above because it's simpler. (that's what I do when I want to see what results would have been with older versions)
Note that the bioconda package for this release is still not out because they seem to be having a build problem in their CI pipeline (see the pull request for status). Once that's cleared up this version should be included in bioconda as well.
Arjun
Thanks for the quick replies and addition of amrfinder_index
! And thanks for the example commands, that helps too. Looking forward to testing it.
I asked this question originally because I'm helping to maintain the StaPH-B docker images for amrfinder
and our goal is to have a way to pin versions of dependencies, such as the amrfinder database, so this will help immensely going forward.
Instead of grabbing & indexing the latest database with amrfinder -u
we would prefer to grab the database files from the FTP followed by indexing. Allows for us to re-build docker images with older DB versions, if ever necessary.
Lastly - THANK YOU for providing docker images and the dockerfiles, not many developers do that, so thank you for your efforts there.
I just looked at the DockerHub description and realized we don't have a link to the Dockerfile we use there, and the description of AMRFinderPlus is out of date. I don't have an authorized account to update those things, but I'll try to find someone who does. It sounds like you found it anyway, but just for future reference the Dockerfile and a script to create the image are in https://github.com/ncbi/docker/tree/master/amr
Thanks for the link and yes I found the dockerfile, but that's because I've seen the ncbi/docker github repo in the past.
dockerhub can inheirit the main /README.md
if linked to a GitHub repo (if you were using dockerhub infrastructure for building images), but in this case it's probably easier to just update the dockerhub repo description manually and link back to https://github.com/ncbi/docker/tree/master/amr
OK getting into the nitty gritty details here. I'm re-working our dockerfile for amrfinder v3.11.2 that does not contain amrfinder_index
and would like to manually index the database files. Am I doing this correctly? I'll provide my Dockerfile code and comment throughout.
I'm definitely not knowledgable about C++ but I gathered this info from the amrfinder_index
code here: https://github.com/ncbi/amr/blob/amrfinder_v3.11.4/amrfinder_index.cpp
Relevant part of Dockerfile:
RUN mkdir -p /amrfinder/data/${AMRFINDER_DB_VER} && \
# download database files from NCBI FTP
wget -q -P /amrfinder/data/${AMRFINDER_DB_VER} ftp://ftp.ncbi.nlm.nih.gov/pathogen/Antimicrobial_resistance/AMRFinderPlus/database/3.11/${AMRFINDER_DB_VER}/* && \
# change into dir with files just downloaded
cd /amrfinder/data/${AMRFINDER_DB_VER} && \
# run hmmpress and makeblastdb on downloaded files
hmmpress AMR.LIB && \
makeblastdb -in AMRProt -dbtype prot && \
makeblastdb -in AMR_CDS -dbtype nucl && \
# have to do this step to ensure docker build is using bash shell and not /bin/sh
/bin/bash -c '\
# loop through the organism specific files, example: AMR_DNA-Clostridioides_difficile.tab
for ORG in AMR_DNA*.tab; do \
# set a new bash variable for each FASTA file
INPUT_FASTA=$(echo $ORG | cut -d "." -f 1); \
# makeblastdb on each of those FASTA files
makeblastdb -in ${INPUT_FASTA} -dbtype nucl ; \
done' && \
# generate softlink
ln -s /amrfinder/data/${AMRFINDER_DB_VER} /amrfinder/data/latest
The output looks correct to me, and I'm able to run a series of test with and without the amrfinder --organism
flag
RUN mkdir -p /amrfinder/data/2023-02-23.1 && wget -q -P /amrfinder/data/2023-02-23.1 ftp://ftp.ncbi.nlm.nih.gov/pathogen/Antimicrobial_resistance/AMRFinderPlus/database/3.11/2023-02-23.1/* && cd /amrfinder/data/2023-02-23.1 && hmmpress AMR.LIB && makeblastdb -in AMRProt -dbtype prot && makeblastdb -in AMR_CDS -dbtype nucl && /bin/bash -c 'for ORG in AMR_DNA*.tab; do INPUT_FASTA=$(echo $ORG | cut -d "." -f 1); echo "makeblastdb -in ${INPUT_FASTA} -dbtype nucl" ; makeblastdb -in ${INPUT_FASTA} -dbtype nucl ; done' && ln -s /amrfinder/data/2023-02-23.1 /amrfinder/data/latest
#7 4.745 Working... done.
#7 7.136 Pressed and indexed 688 HMMs (688 names and 688 accessions).
#7 7.136 Models pressed into binary file: AMR.LIB.h3m
#7 7.136 SSI index for binary model file: AMR.LIB.h3i
#7 7.136 Profiles (MSV part) pressed into: AMR.LIB.h3f
#7 7.136 Profiles (remainder) pressed into: AMR.LIB.h3p
#7 7.165
#7 7.165
#7 7.165 Building a new DB, current time: 03/09/2023 23:30:59
#7 7.165 New DB name: /amrfinder/data/2023-02-23.1/AMRProt
#7 7.165 New DB title: AMRProt
#7 7.165 Sequence type: Protein
#7 7.166 Keep MBits: T
#7 7.166 Maximum file size: 1000000000B
#7 7.375 Adding sequences from FASTA; added 7809 sequences in 0.209066 seconds.
#7 7.382
#7 7.382
#7 7.414
#7 7.414
#7 7.414 Building a new DB, current time: 03/09/2023 23:30:59
#7 7.414 New DB name: /amrfinder/data/2023-02-23.1/AMR_CDS
#7 7.414 New DB title: AMR_CDS
#7 7.414 Sequence type: Nucleotide
#7 7.415 Keep MBits: T
#7 7.415 Maximum file size: 1000000000B
#7 7.669 Adding sequences from FASTA; added 7614 sequences in 0.253481 seconds.
#7 7.674
#7 7.674
#7 7.687 makeblastdb -in AMR_DNA-Campylobacter -dbtype nucl
#7 7.715
#7 7.715
#7 7.715 Building a new DB, current time: 03/09/2023 23:30:59
#7 7.715 New DB name: /amrfinder/data/2023-02-23.1/AMR_DNA-Campylobacter
#7 7.715 New DB title: AMR_DNA-Campylobacter
#7 7.715 Sequence type: Nucleotide
#7 7.715 Keep MBits: T
#7 7.715 Maximum file size: 1000000000B
#7 7.716 Adding sequences from FASTA; added 2 sequences in 0.00100303 seconds.
#7 7.720
#7 7.720
#7 7.729 makeblastdb -in AMR_DNA-Clostridioides_difficile -dbtype nucl
#7 7.757
#7 7.757
#7 7.757 Building a new DB, current time: 03/09/2023 23:30:59
#7 7.757 New DB name: /amrfinder/data/2023-02-23.1/AMR_DNA-Clostridioides_difficile
#7 7.757 New DB title: AMR_DNA-Clostridioides_difficile
#7 7.757 Sequence type: Nucleotide
#7 7.757 Keep MBits: T
#7 7.757 Maximum file size: 1000000000B
#7 7.758 Adding sequences from FASTA; added 1 sequences in 0.000326872 seconds.
#7 7.763
#7 7.763
#7 7.772 makeblastdb -in AMR_DNA-Enterococcus_faecalis -dbtype nucl
#7 7.798
#7 7.798
#7 7.798 Building a new DB, current time: 03/09/2023 23:30:59
#7 7.798 New DB name: /amrfinder/data/2023-02-23.1/AMR_DNA-Enterococcus_faecalis
#7 7.798 New DB title: AMR_DNA-Enterococcus_faecalis
#7 7.798 Sequence type: Nucleotide
#7 7.799 Keep MBits: T
#7 7.799 Maximum file size: 1000000000B
#7 7.800 Adding sequences from FASTA; added 1 sequences in 0.000380039 seconds.
#7 7.805
#7 7.805
#7 7.815 makeblastdb -in AMR_DNA-Enterococcus_faecium -dbtype nucl
#7 7.841
#7 7.841
#7 7.841 Building a new DB, current time: 03/09/2023 23:30:59
#7 7.841 New DB name: /amrfinder/data/2023-02-23.1/AMR_DNA-Enterococcus_faecium
#7 7.841 New DB title: AMR_DNA-Enterococcus_faecium
#7 7.841 Sequence type: Nucleotide
#7 7.842 Keep MBits: T
#7 7.842 Maximum file size: 1000000000B
#7 7.842 Adding sequences from FASTA; added 1 sequences in 0.000319004 seconds.
#7 7.848
#7 7.848
#7 7.857 makeblastdb -in AMR_DNA-Escherichia -dbtype nucl
#7 7.884
#7 7.884
#7 7.884 Building a new DB, current time: 03/09/2023 23:30:59
#7 7.884 New DB name: /amrfinder/data/2023-02-23.1/AMR_DNA-Escherichia
#7 7.884 New DB title: AMR_DNA-Escherichia
#7 7.884 Sequence type: Nucleotide
#7 7.885 Keep MBits: T
#7 7.885 Maximum file size: 1000000000B
#7 7.887 Adding sequences from FASTA; added 4 sequences in 0.001405 seconds.
#7 7.891
#7 7.891
#7 7.900 makeblastdb -in AMR_DNA-Klebsiella_oxytoca -dbtype nucl
#7 7.928
#7 7.928
#7 7.928 Building a new DB, current time: 03/09/2023 23:30:59
#7 7.928 New DB name: /amrfinder/data/2023-02-23.1/AMR_DNA-Klebsiella_oxytoca
#7 7.928 New DB title: AMR_DNA-Klebsiella_oxytoca
#7 7.928 Sequence type: Nucleotide
#7 7.929 Keep MBits: T
#7 7.929 Maximum file size: 1000000000B
#7 7.929 Adding sequences from FASTA; added 1 sequences in 0.000295877 seconds.
#7 7.934
#7 7.934
#7 7.944 makeblastdb -in AMR_DNA-Neisseria_gonorrhoeae -dbtype nucl
#7 7.971
#7 7.971
#7 7.971 Building a new DB, current time: 03/09/2023 23:30:59
#7 7.971 New DB name: /amrfinder/data/2023-02-23.1/AMR_DNA-Neisseria_gonorrhoeae
#7 7.971 New DB title: AMR_DNA-Neisseria_gonorrhoeae
#7 7.971 Sequence type: Nucleotide
#7 7.972 Keep MBits: T
#7 7.972 Maximum file size: 1000000000B
#7 7.973 Adding sequences from FASTA; added 6 sequences in 0.00129104 seconds.
#7 7.978
#7 7.978
#7 7.988 makeblastdb -in AMR_DNA-Salmonella -dbtype nucl
#7 8.015
#7 8.015
#7 8.015 Building a new DB, current time: 03/09/2023 23:30:59
#7 8.015 New DB name: /amrfinder/data/2023-02-23.1/AMR_DNA-Salmonella
#7 8.015 New DB title: AMR_DNA-Salmonella
#7 8.015 Sequence type: Nucleotide
#7 8.016 Keep MBits: T
#7 8.016 Maximum file size: 1000000000B
#7 8.017 Adding sequences from FASTA; added 1 sequences in 0.000291109 seconds.
#7 8.023
#7 8.023
#7 8.032 makeblastdb -in AMR_DNA-Staphylococcus_aureus -dbtype nucl
#7 8.059
#7 8.059
#7 8.059 Building a new DB, current time: 03/09/2023 23:30:59
#7 8.059 New DB name: /amrfinder/data/2023-02-23.1/AMR_DNA-Staphylococcus_aureus
#7 8.059 New DB title: AMR_DNA-Staphylococcus_aureus
#7 8.059 Sequence type: Nucleotide
#7 8.060 Keep MBits: T
#7 8.060 Maximum file size: 1000000000B
#7 8.061 Adding sequences from FASTA; added 2 sequences in 0.00118399 seconds.
#7 8.066
#7 8.066
#7 8.076 makeblastdb -in AMR_DNA-Streptococcus_pneumoniae -dbtype nucl
#7 8.103
#7 8.103
#7 8.103 Building a new DB, current time: 03/09/2023 23:31:00
#7 8.103 New DB name: /amrfinder/data/2023-02-23.1/AMR_DNA-Streptococcus_pneumoniae
#7 8.103 New DB title: AMR_DNA-Streptococcus_pneumoniae
#7 8.103 Sequence type: Nucleotide
#7 8.104 Keep MBits: T
#7 8.104 Maximum file size: 1000000000B
#7 8.105 Adding sequences from FASTA; added 1 sequences in 0.000328064 seconds.
#7 8.110
#7 8.110
#7 DONE 8.5s
#8 [app 5/5] WORKDIR /data
#8 DONE 0.0s
#9 [test 1/4] RUN amrfinder -l
#9 0.444 Running: amrfinder -l
#9 0.444 Software directory: '/amrfinder/'
#9 0.444 Software version: 3.11.2
#9 0.445 Database directory: '/amrfinder/data/2023-02-23.1'
#9 0.445 Database version: 2023-02-23.1
#9 0.448
#9 0.448 Available --organism options: Acinetobacter_baumannii, Burkholderia_cepacia, Burkholderia_pseudomallei, Campylobacter, Clostridioides_difficile, Enterococcus_faecalis, Enterococcus_faecium, Escherichia, Klebsiella_oxytoca, Klebsiella_pneumoniae, Neisseria_gonorrhoeae, Neisseria_meningitidis, Pseudomonas_aeruginosa, Salmonella, Staphylococcus_aureus, Staphylococcus_pseudintermedius, Streptococcus_agalactiae, Streptococcus_pneumoniae, Streptococcus_pyogenes, Vibrio_cholerae
#9 DONE 0.5s
#10 [test 2/4] RUN amrfinder --plus -p /amrfinder/test_prot.fa -g /amrfinder/test_prot.gff -O Escherichia > test_prot.got && diff /amrfinder/test_prot.expected test_prot.got && amrfinder --plus -n /amrfinder/test_dna.fa -O Escherichia > test_dna.got && diff /amrfinder/test_dna.expected test_dna.got && amrfinder --plus -n /amrfinder/test_dna.fa -p /amrfinder/test_prot.fa -g /amrfinder/test_prot.gff -O Escherichia > test_both.got && diff /amrfinder/test_both.expected test_both.got
#10 0.435 Running: amrfinder --plus -p /amrfinder/test_prot.fa -g /amrfinder/test_prot.gff -O Escherichia
#10 0.435 Software directory: '/amrfinder/'
#10 0.435 Software version: 3.11.2
#10 0.435 Database directory: '/amrfinder/data/2023-02-23.1'
#10 0.435 Database version: 2023-02-23.1
#10 0.435 AMRFinder protein-only and mutation search
#10 0.435 - include -n NUC_FASTA, --nucleotide NUC_FASTA and -g GFF_FILE, --gff GFF_FILE options to add translated searches
#10 0.469 Running blastp...
#10 2.589 Running hmmsearch...
#10 3.667 Making report...
#10 3.762 AMRFinder took 3 seconds to complete
#10 3.772 Running: amrfinder --plus -n /amrfinder/test_dna.fa -O Escherichia
#10 3.772 Software directory: '/amrfinder/'
#10 3.772 Software version: 3.11.2
#10 3.772 Database directory: '/amrfinder/data/2023-02-23.1'
#10 3.773 Database version: 2023-02-23.1
#10 3.773 AMRFinder translated nucleotide and mutation search
#10 3.782 Running blastx...
#10 6.673 Running blastn...
#10 6.784 Making report...
#10 6.934 AMRFinder took 3 seconds to complete
#10 6.942 Running: amrfinder --plus -n /amrfinder/test_dna.fa -p /amrfinder/test_prot.fa -g /amrfinder/test_prot.gff -O Escherichia
#10 6.943 Software directory: '/amrfinder/'
#10 6.943 Software version: 3.11.2
#10 6.943 Database directory: '/amrfinder/data/2023-02-23.1'
#10 6.944 Database version: 2023-02-23.1
#10 6.944 AMRFinder combined translated and protein and mutation search
#10 6.954 Running blastp...
#10 9.003 Running hmmsearch...
#10 10.05 Running blastx...
#10 12.94 Running blastn...
#10 13.04 Making report...
#10 13.25 AMRFinder took 7 seconds to complete
#10 DONE 13.3s
#11 [test 3/4] RUN wget https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/010/941/835/GCA_010941835.1_PDT000052640.3/GCA_010941835.1_PDT000052640.3_genomic.fna.gz && gzip -d GCA_010941835.1_PDT000052640.3_genomic.fna.gz && amrfinder --plus --nucleotide GCA_010941835.1_PDT000052640.3_genomic.fna --output test1.txt && amrfinder --plus --nucleotide GCA_010941835.1_PDT000052640.3_genomic.fna --organism Salmonella --output test2.txt && cat test1.txt test2.txt
#11 0.389 --2023-03-09 23:31:14-- https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/010/941/835/GCA_010941835.1_PDT000052640.3/GCA_010941835.1_PDT000052640.3_genomic.fna.gz
#11 0.397 Resolving ftp.ncbi.nlm.nih.gov (ftp.ncbi.nlm.nih.gov)... 165.112.9.230, 130.14.250.7, 2607:f220:41e:250::13, ...
#11 0.399 Connecting to ftp.ncbi.nlm.nih.gov (ftp.ncbi.nlm.nih.gov)|165.112.9.230|:443... connected.
#11 0.452 HTTP request sent, awaiting response... 200 OK
#11 0.587 Length: 1431272 (1.4M) [application/x-gzip]
#11 0.588 Saving to: 'GCA_010941835.1_PDT000052640.3_genomic.fna.gz'
#11 0.601
#11 0.601 0K .......... .......... .......... .......... .......... 3% 1.77M 1s
#11 0.615 50K .......... .......... .......... .......... .......... 7% 3.54M 1s
#11 0.629 100K .......... .......... .......... .......... .......... 10% 78.5M 0s
#11 0.630 150K .......... .......... .......... .......... .......... 14% 3.74M 0s
#11 0.643 200K .......... .......... .......... .......... .......... 17% 60.8M 0s
#11 0.643 250K .......... .......... .......... .......... .......... 21% 112M 0s
#11 0.644 300K .......... .......... .......... .......... .......... 25% 121M 0s
#11 0.644 350K .......... .......... .......... .......... .......... 28% 3.86M 0s
#11 0.657 400K .......... .......... .......... .......... .......... 32% 41.2M 0s
#11 0.658 450K .......... .......... .......... .......... .......... 35% 139M 0s
#11 0.658 500K .......... .......... .......... .......... .......... 39% 191M 0s
#11 0.660 550K .......... .......... .......... .......... .......... 42% 180M 0s
#11 0.660 600K .......... .......... .......... .......... .......... 46% 185M 0s
#11 0.660 650K .......... .......... .......... .......... .......... 50% 200M 0s
#11 0.660 700K .......... .......... .......... .......... .......... 53% 211M 0s
#11 0.660 750K .......... .......... .......... .......... .......... 57% 4.16M 0s
#11 0.672 800K .......... .......... .......... .......... .......... 60% 101M 0s
#11 0.672 850K .......... .......... .......... .......... .......... 64% 157M 0s
#11 0.672 900K .......... .......... .......... .......... .......... 67% 114M 0s
#11 0.675 950K .......... .......... .......... .......... .......... 71% 196M 0s
#11 0.675 1000K .......... .......... .......... .......... .......... 75% 156M 0s
#11 0.675 1050K .......... .......... .......... .......... .......... 78% 146M 0s
#11 0.675 1100K .......... .......... .......... .......... .......... 82% 169M 0s
#11 0.675 1150K .......... .......... .......... .......... .......... 85% 181M 0s
#11 0.675 1200K .......... .......... .......... .......... .......... 89% 169M 0s
#11 0.675 1250K .......... .......... .......... .......... .......... 93% 6.30M 0s
#11 0.683 1300K .......... .......... .......... .......... .......... 96% 131M 0s
#11 0.683 1350K .......... .......... .......... .......... ....... 100% 203M=0.1s
#11 0.683
#11 0.683 2023-03-09 23:31:14 (14.3 MB/s) - 'GCA_010941835.1_PDT000052640.3_genomic.fna.gz' saved [1431272/1431272]
#11 0.683
#11 0.741 Running: amrfinder --plus --nucleotide GCA_010941835.1_PDT000052640.3_genomic.fna --output test1.txt
#11 0.741 Software directory: '/amrfinder/'
#11 0.741 Software version: 3.11.2
#11 0.741 Database directory: '/amrfinder/data/2023-02-23.1'
#11 0.741 Database version: 2023-02-23.1
#11 0.741 AMRFinder translated nucleotide search
#11 0.741 - include -O ORGANISM, --organism ORGANISM option to add mutation searches and suppress common proteins
#11 0.943 Running tblastn...
#11 64.56 Making report...
#11 64.74 AMRFinder took 64 seconds to complete
#11 64.75 Running: amrfinder --plus --nucleotide GCA_010941835.1_PDT000052640.3_genomic.fna --organism Salmonella --output test2.txt
#11 64.75 Software directory: '/amrfinder/'
#11 64.75 Software version: 3.11.2
#11 64.75 Database directory: '/amrfinder/data/2023-02-23.1'
#11 64.75 Database version: 2023-02-23.1
#11 64.75 AMRFinder translated nucleotide and mutation search
#11 64.95 Running tblastn...
#11 128.2 Running blastn...
#11 129.4 Making report...
#11 129.6 AMRFinder took 64 seconds to complete
#11 129.6 Protein identifier Contig id Start Stop Strand Gene symbol Sequence name Scope Element type Element subtype Class Subclass Method Target length Reference sequence length % Coverage of reference sequence % Identity to reference sequence Alignment length Accession of closest sequence Name of closest sequence HMM id HMM description
#11 129.6 NA AAPBRJ010000001.1 594233 597859 - iroC salmochelin/enterobactin export ABC transporter IroC plus VIRULENCE VIRULENCE NA NA BLASTX 1209 1219 98.85 79.85 1211 AUH19662.1 salmochelin/enterobactin export ABC transporter IroC NA NA
#11 129.6 NA AAPBRJ010000001.1 597943 599055 - iroB salmochelin biosynthesis C-glycosyltransferase IroB plus VIRULENCE VIRULENCE NA NA BLASTX 371 371 100.00 86.52 371 EOW04219.1 salmochelin biosynthesis C-glycosyltransferase IroB NA NA
#11 129.6 NA AAPBRJ010000002.1 393809 394270 - golS Au(I) sensor transcriptional regulator GolS plus STRESS METAL GOLD GOLD EXACTX 154 154 100.00 100.00 154 AAL19308.1 Au(I) sensor transcriptional regulator GolS NA NA
#11 129.6 NA AAPBRJ010000002.1 394285 396570 - golT gold/copper-translocating P-type ATPase GolT plus STRESS METAL COPPER/GOLD COPPER/GOLD BLASTX 762 762 100.00 99.61 762 AAL19307.1 gold/copper-translocating P-type ATPase GolT NA NA
#11 129.6 NA AAPBRJ010000002.1 396847 398070 + mdsA multidrug efflux RND transporter periplasmic adaptor subunit MdsA plus AMR AMR EFFLUX EFFLUX BLASTX 408 408 100.00 98.28 408 AAL19306.1 multidrug efflux RND transporter periplasmic adaptor subunit MdsA NA NA
#11 129.6 NA AAPBRJ010000002.1 398070 401234 + mdsB multidrug efflux RND transporter permease subunit MdsB plus AMR AMR EFFLUX EFFLUX BLASTX 1055 1055 100.00 99.81 1055 AAL19305.1 multidrug efflux RND transporter permease subunit MdsB NA NA
#11 129.6 NA AAPBRJ010000006.1 124903 125433 + sodC1 superoxide dismutase [Cu-Zn] SodC1 plus VIRULENCE VIRULENCE NA NA EXACTX 177 177 100.00 100.00 177 AAL19978.1 superoxide dismutase [Cu-Zn] SodC1 NA NA
#11 129.6 NA AAPBRJ010000011.1 69434 70321 + fieF CDF family cation-efflux transporter FieF plus STRESS METAL NA NA BLASTX 296 300 98.67 92.57 296 BAE77395.1 CDF family cation-efflux transporter FieF NA NA
#11 129.6 NA AAPBRJ010000014.1 97144 99075 + sinH intimin-like inverse autotransporter SinH plus VIRULENCE VIRULENCE NA NA PARTIALX 644 730 88.22 99.84 644 AAL21411.1 intimin-like inverse autotransporter SinH NA NA
#11 129.6 NA AAPBRJ010000032.1 581 1438 + blaTEM-57 broad-spectrum class A beta-lactamase TEM-57 core AMR AMR BETA-LACTAM BETA-LACTAM ALLELEX 286 286 100.00 100.00 286 WP_032492330.1 broad-spectrum class A beta-lactamase TEM-57 NA NA
#11 129.6 NA AAPBRJ010000032.1 4081 5277 - tet(A) tetracycline efflux MFS transporter Tet(A) core AMR AMR TETRACYCLINE TETRACYCLINE BLASTX 399 399 100.00 99.75 399 WP_000804064.1 tetracycline efflux MFS transporter Tet(A) NA NA
#11 129.6 NA AAPBRJ010000045.1 1480 2676 + tet(A) tetracycline efflux MFS transporter Tet(A) core AMR AMR TETRACYCLINE TETRACYCLINE BLASTX 399 399 100.00 99.75 399 WP_000804064.1 tetracycline efflux MFS transporter Tet(A) NA NA
#11 129.6 NA AAPBRJ010000045.1 5319 6176 - blaTEM-57 broad-spectrum class A beta-lactamase TEM-57 core AMR AMR BETA-LACTAM BETA-LACTAM ALLELEX 286 286 100.00 100.00 286 WP_032492330.1 broad-spectrum class A beta-lactamase TEM-57 NA NA
#11 129.6 Protein identifier Contig id Start Stop Strand Gene symbol Sequence name Scope Element type Element subtype Class Subclass Method Target length Reference sequence length % Coverage of reference sequence % Identity to reference sequence Alignment length Accession of closest sequence Name of closest sequence HMM id HMM description
#11 129.6 NA AAPBRJ010000001.1 594233 597859 - iroC salmochelin/enterobactin export ABC transporter IroC plus VIRULENCE VIRULENCE NA NA BLASTX 1209 1219 98.85 79.85 1211 AUH19662.1 salmochelin/enterobactin export ABC transporter IroC NA NA
#11 129.6 NA AAPBRJ010000001.1 597943 599055 - iroB salmochelin biosynthesis C-glycosyltransferase IroB plus VIRULENCE VIRULENCE NA NA BLASTX 371 371 100.00 86.52 371 EOW04219.1 salmochelin biosynthesis C-glycosyltransferase IroB NA NA
#11 129.6 NA AAPBRJ010000002.1 393809 394270 - golS Au(I) sensor transcriptional regulator GolS plus STRESS METAL GOLD GOLD EXACTX 154 154 100.00 100.00 154 AAL19308.1 Au(I) sensor transcriptional regulator GolS NA NA
#11 129.6 NA AAPBRJ010000002.1 394285 396570 - golT gold/copper-translocating P-type ATPase GolT plus STRESS METAL COPPER/GOLD COPPER/GOLD BLASTX 762 762 100.00 99.61 762 AAL19307.1 gold/copper-translocating P-type ATPase GolT NA NA
#11 129.6 NA AAPBRJ010000002.1 396847 398070 + mdsA multidrug efflux RND transporter periplasmic adaptor subunit MdsA plus AMR AMR EFFLUX EFFLUX BLASTX 408 408 100.00 98.28 408 AAL19306.1 multidrug efflux RND transporter periplasmic adaptor subunit MdsA NA NA
#11 129.6 NA AAPBRJ010000002.1 398070 401234 + mdsB multidrug efflux RND transporter permease subunit MdsB plus AMR AMR EFFLUX EFFLUX BLASTX 1055 1055 100.00 99.81 1055 AAL19305.1 multidrug efflux RND transporter permease subunit MdsB NA NA
#11 129.6 NA AAPBRJ010000006.1 124903 125433 + sodC1 superoxide dismutase [Cu-Zn] SodC1 plus VIRULENCE VIRULENCE NA NA EXACTX 177 177 100.00 100.00 177 AAL19978.1 superoxide dismutase [Cu-Zn] SodC1 NA NA
#11 129.6 NA AAPBRJ010000008.1 160594 163227 + gyrA_S83Y Salmonella quinolone resistant GyrA core AMR POINT QUINOLONE QUINOLONE POINTX 878 878 100.00 99.89 878 WP_001281271.1 DNA gyrase subunit A GyrA NA NA
#11 129.6 NA AAPBRJ010000014.1 97144 99075 + sinH intimin-like inverse autotransporter SinH plus VIRULENCE VIRULENCE NA NA PARTIALX 644 730 88.22 99.84 644 AAL21411.1 intimin-like inverse autotransporter SinH NA NA
#11 129.6 NA AAPBRJ010000032.1 581 1438 + blaTEM-57 broad-spectrum class A beta-lactamase TEM-57 core AMR AMR BETA-LACTAM BETA-LACTAM ALLELEX 286 286 100.00 100.00 286 WP_032492330.1 broad-spectrum class A beta-lactamase TEM-57 NA NA
#11 129.6 NA AAPBRJ010000032.1 4081 5277 - tet(A) tetracycline efflux MFS transporter Tet(A) core AMR AMR TETRACYCLINE TETRACYCLINE BLASTX 399 399 100.00 99.75 399 WP_000804064.1 tetracycline efflux MFS transporter Tet(A) NA NA
#11 129.6 NA AAPBRJ010000045.1 1480 2676 + tet(A) tetracycline efflux MFS transporter Tet(A) core AMR AMR TETRACYCLINE TETRACYCLINE BLASTX 399 399 100.00 99.75 399 WP_000804064.1 tetracycline efflux MFS transporter Tet(A) NA NA
#11 129.6 NA AAPBRJ010000045.1 5319 6176 - blaTEM-57 broad-spectrum class A beta-lactamase TEM-57 core AMR AMR BETA-LACTAM BETA-LACTAM ALLELEX 286 286 100.00 100.00 286 WP_032492330.1 broad-spectrum class A beta-lactamase TEM-57 NA NA
#11 DONE 129.6s
#12 [test 4/4] RUN wget https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/003/812/925/GCA_003812925.1_ASM381292v1/GCA_003812925.1_ASM381292v1_genomic.fna.gz && gzip -d GCA_003812925.1_ASM381292v1_genomic.fna.gz && amrfinder --plus --name GCA_003812925.1 -n GCA_003812925.1_ASM381292v1_genomic.fna -O Klebsiella_oxytoca -o GCA_003812925.1-amrfinder.tsv
#12 0.430 --2023-03-09 23:33:24-- https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/003/812/925/GCA_003812925.1_ASM381292v1/GCA_003812925.1_ASM381292v1_genomic.fna.gz
#12 0.435 Resolving ftp.ncbi.nlm.nih.gov (ftp.ncbi.nlm.nih.gov)... 130.14.250.7, 130.14.250.10, 2607:f220:41f:250::229, ...
#12 0.495 Connecting to ftp.ncbi.nlm.nih.gov (ftp.ncbi.nlm.nih.gov)|130.14.250.7|:443... connected.
#12 0.550 HTTP request sent, awaiting response... 200 OK
#12 0.570 Length: 1741196 (1.7M) [application/x-gzip]
#12 0.571 Saving to: 'GCA_003812925.1_ASM381292v1_genomic.fna.gz'
#12 0.586
#12 0.586 0K .......... .......... .......... .......... .......... 2% 1.55M 1s
#12 0.602 50K .......... .......... .......... .......... .......... 5% 3.08M 1s
#12 0.618 100K .......... .......... .......... .......... .......... 8% 3.12M 1s
#12 0.634 150K .......... .......... .......... .......... .......... 11% 130M 0s
#12 0.634 200K .......... .......... .......... .......... .......... 14% 200M 0s
#12 0.634 250K .......... .......... .......... .......... .......... 17% 242M 0s
#12 0.635 300K .......... .......... .......... .......... .......... 20% 373K 1s
#12 0.770 350K .......... .......... .......... .......... .......... 23% 154M 1s
#12 0.770 400K .......... .......... .......... .......... .......... 26% 128M 1s
#12 0.770 450K .......... .......... .......... .......... .......... 29% 174M 0s
#12 0.770 500K .......... .......... .......... .......... .......... 32% 137M 0s
#12 0.770 550K .......... .......... .......... .......... .......... 35% 126M 0s
#12 0.770 600K .......... .......... .......... .......... .......... 38% 1.87M 0s
#12 0.796 650K .......... .......... .......... .......... .......... 41% 109M 0s
#12 0.797 700K .......... .......... .......... .......... .......... 44% 135M 0s
#12 0.797 750K .......... .......... .......... .......... .......... 47% 146M 0s
#12 0.797 800K .......... .......... .......... .......... .......... 49% 92.3M 0s
#12 0.798 850K .......... .......... .......... .......... .......... 52% 149M 0s
#12 0.798 900K .......... .......... .......... .......... .......... 55% 143M 0s
#12 0.799 950K .......... .......... .......... .......... .......... 58% 30.9M 0s
#12 0.800 1000K .......... .......... .......... .......... .......... 61% 3.44M 0s
#12 0.815 1050K .......... .......... .......... .......... .......... 64% 105M 0s
#12 0.815 1100K .......... .......... .......... .......... .......... 67% 125M 0s
#12 0.815 1150K .......... .......... .......... .......... .......... 70% 124M 0s
#12 0.816 1200K .......... .......... .......... .......... .......... 73% 118M 0s
#12 0.816 1250K .......... .......... .......... .......... .......... 76% 2.11M 0s
#12 0.839 1300K .......... .......... .......... .......... .......... 79% 34.0M 0s
#12 0.841 1350K .......... .......... .......... .......... .......... 82% 75.3M 0s
#12 0.841 1400K .......... .......... .......... .......... .......... 85% 170M 0s
#12 0.841 1450K .......... .......... .......... .......... .......... 88% 202M 0s
#12 0.842 1500K .......... .......... .......... .......... .......... 91% 225M 0s
#12 0.842 1550K .......... .......... .......... .......... .......... 94% 117M 0s
#12 0.843 1600K .......... .......... .......... .......... .......... 97% 62.8M 0s
#12 0.843 1650K .......... .......... .......... .......... .......... 99% 138M 0s
#12 0.843 1700K 100% 738G=0.3s
#12 0.843
#12 0.844 2023-03-09 23:33:24 (6.09 MB/s) - 'GCA_003812925.1_ASM381292v1_genomic.fna.gz' saved [1741196/1741196]
#12 0.844
#12 0.906 Running: amrfinder --plus --name GCA_003812925.1 -n GCA_003812925.1_ASM381292v1_genomic.fna -O Klebsiella_oxytoca -o GCA_003812925.1-amrfinder.tsv
#12 0.906 Software directory: '/amrfinder/'
#12 0.906 Software version: 3.11.2
#12 0.907 Database directory: '/amrfinder/data/2023-02-23.1'
#12 0.907 Database version: 2023-02-23.1
#12 0.907 AMRFinder translated nucleotide and mutation search
#12 1.137 Running tblastn...
#12 246.2 Running blastn...
#12 247.6 Making report...
#12 247.8 AMRFinder took 247 seconds to complete
If you want to see the full dockerfile, I have it on an open PR over here: https://github.com/StaPH-B/docker-builds/pull/631/files
The dockerfile I'm describing above ^ is ncbi-amrfinderplus/3.11.2-2023-02-23.1/Dockerfile
Hi Curtis,
Your approach looks good to me. The AMR_DNA-* files are only used if there are point mutations only identified by DNA (e.g., 16S or promoter mutations) for that organism so your approach looks good. Here's a shell script I used to rebuild the database prior to having amrfinder_index
which uses a slightly different approach from yours but with, I think, the same effect:
#!/bin/bash
# hmmer
echo "hmmpress -f AMR.LIB"
hmmpress -f AMR.LIB
# the DNA files
for tg in `cut -f 1 taxgroup.tab | grep -v '^#'`
do
fasta_dna="AMR_DNA-$tg"
if [ -e "$fasta_dna" ]
then
echo "makeblastdb -in $fasta_dna -dbtype nucl"
makeblastdb -in $fasta_dna -dbtype nucl
fi
done
# AMRProt
echo "makeblastdb -in AMRProt -dbtype prot"
makeblastdb -in AMRProt -dbtype prot
Also, I quickly skimmed through your Dockerfile (I really only dabble in Docker, so I'm no expert) and it otherwise looks good to me. I noticed you're running your own tests. We also distribute a test script (test_amrfinder.sh
), though I'm not sure if you want to use it because there can, occasionally, be interactions between the version of the database and the expected test output. That said your tests look fine, and since you're pinning a database and software version for each directory you don't have to worry about changes in database versions changing expected test output.
Great, thanks for taking a look and sharing the code! That test script is useful, would be nice to incorporate that in the future.
The first tests run in our dockerfile should capture the same behavior - run the amrfinder
test files, followed by diff
and exit/fail to build the docker image if exit code is >0
Thanks so much!
Hi Arjun and team,
I have a quick question - is there a way with the executable
amrfinder
(or otherwise) to install an older version of the amrfinderplus database? For example, if I wanted to reproduce results from last year using an older database version2022-08-09.1
. Is there a way to pin the install to that specific database version?I know I can provide the database location with
amrfinder --database /path/to/db
but I wasn't sure if other things need to be done such as database indexing, setup, etc.I realize there may be incompatibilities between the amrfinderplus version & database versions, but let's ignore those for now.
I've read through the (amazing) documentation and could not find the answer so that led me here.
Curtis