sanger-pathogens / ariba

Antimicrobial Resistance Identification By Assembly
http://sanger-pathogens.github.io/ariba/
Other
167 stars 52 forks source link

get_ref card is attempting to download RGI #301

Closed rpetit3 closed 3 years ago

rpetit3 commented 3 years ago

Hello!

Looks like there's been some changes to the CARD download URL. Changes are causing RGI v5.1.0 to be downloaded instead of card.

 ariba getref card card
Getting available CARD versions
Downloading "https://card.mcmaster.ca/download" and saving as "download.html" ... done
Found versions:
1.0.0   https://card.mcmaster.ca/download/0/broadstreet-v1.0.0.tar.bz2
1.0.1   https://card.mcmaster.ca/download/0/broadstreet-v1.0.1.tar.bz2
1.0.2   https://card.mcmaster.ca/download/0/broadstreet-v1.0.2.tar.bz2
1.0.3   https://card.mcmaster.ca/download/0/broadstreet-v1.0.3.tar.bz2
1.0.4   https://card.mcmaster.ca/download/0/broadstreet-v1.0.4.tar.bz2
1.0.5   https://card.mcmaster.ca/download/0/broadstreet-v1.0.5.tar.bz2
1.0.6   https://card.mcmaster.ca/download/0/broadstreet-v1.0.6.tar.bz2
1.0.7   https://card.mcmaster.ca/download/0/broadstreet-v1.0.7.tar.bz2
1.0.8   https://card.mcmaster.ca/download/0/broadstreet-v1.0.8.tar.bz2
1.0.9   https://card.mcmaster.ca/download/0/broadstreet-v1.0.9.tar.bz2
1.1.0   https://card.mcmaster.ca/download/0/broadstreet-v1.1.0.tar.bz2
1.1.1   https://card.mcmaster.ca/download/0/broadstreet-v1.1.1.tar.bz2
1.1.2   https://card.mcmaster.ca/download/0/broadstreet-v1.1.2.tar.bz2
1.1.3   https://card.mcmaster.ca/download/0/broadstreet-v1.1.3.tar.bz2
1.1.4   https://card.mcmaster.ca/download/0/broadstreet-v1.1.4.tar.bz2
1.1.5   https://card.mcmaster.ca/download/0/broadstreet-v1.1.5.tar.bz2
1.1.6   https://card.mcmaster.ca/download/0/broadstreet-v1.1.6.tar.bz2
1.1.7   https://card.mcmaster.ca/download/0/broadstreet-v1.1.7.tar.bz2
1.1.8   https://card.mcmaster.ca/download/0/broadstreet-v1.1.8.tar.bz2
1.1.9   https://card.mcmaster.ca/download/0/broadstreet-v1.1.9.tar.bz2
1.2.0   https://card.mcmaster.ca/download/0/broadstreet-v1.2.0.tar.bz2
1.2.1   https://card.mcmaster.ca/download/0/broadstreet-v1.2.1.tar.bz2
2.0.0   https://card.mcmaster.ca/download/0/broadstreet-v2.0.0.tar.gz
2.0.1   https://card.mcmaster.ca/download/0/broadstreet-v2.0.1.tar.gz
2.0.2   https://card.mcmaster.ca/download/0/broadstreet-v2.0.2.tar.gz
2.0.3   https://card.mcmaster.ca/download/0/broadstreet-v2.0.3.tar.gz
3.0.0   https://card.mcmaster.ca/download/0/broadstreet-v3.0.0.tar.gz
3.0.1   https://card.mcmaster.ca/download/0/broadstreet-v3.0.1.tar.gz
3.0.2   https://card.mcmaster.ca/download/0/broadstreet-v3.0.2.tar.gz
3.0.3   https://card.mcmaster.ca/download/0/broadstreet-v3.0.3.tar.gz
3.0.4   https://card.mcmaster.ca/download/0/broadstreet-v3.0.4.tar.gz
3.0.5   https://card.mcmaster.ca/download/0/broadstreet-v3.0.5.tar.gz
3.0.6   https://card.mcmaster.ca/download/0/broadstreet-v3.0.6.tar.gz
3.0.7   https://card.mcmaster.ca/download/0/broadstreet-v3.0.7.tar.gz
3.0.8   https://card.mcmaster.ca/download/0/broadstreet-v3.0.8.tar.bz2
3.0.9   https://card.mcmaster.ca/download/0/broadstreet-v3.0.9.tar.bz2
3.1.0   https://card.mcmaster.ca/download/0/broadstreet-v3.1.0.tar.bz2
5.1.0   https://card.mcmaster.ca/download/1/software-v5.1.1.tar.bz2">DOWNLOAD</a></td></tr><div id="other-software" class="more collapse"><tr class="more-software collapse"><td>RGI Software</td><td class="hidden-xs">RGI version 5.1.0 - added beta version for K-mer taxonomic classifiers (rgi kmer_build and rgi kmer_query), updated the code to accept a broader range of nucleotide redundancy codes, and set biopython to version 1.72 due to bug on biopython version 1.74</td><td class="hidden-xs">5.1.0</td><td class="hidden-xs">TAR</td><td class="hidden-xs">2019-08-22 09:43:12.5152</td><td><a href="/download/1/software-v5.1.0.tar.gz
Getting version 5.1.0
Working in temporary directory /home/rpetit/test-grounds/training_set/card.download
Downloading data from card: https://card.mcmaster.ca/download/1/software-v5.1.1.tar.bz2">DOWNLOAD</a></td></tr><div id="other-software" class="more collapse"><tr class="more-software collapse"><td>RGI Software</td><td class="hidden-xs">RGI version 5.1.0 - added beta version for K-mer taxonomic classifiers (rgi kmer_build and rgi kmer_query), updated the code to accept a broader range of nucleotide redundancy codes, and set biopython to version 1.72 due to bug on biopython version 1.74</td><td class="hidden-xs">5.1.0</td><td class="hidden-xs">TAR</td><td class="hidden-xs">2019-08-22 09:43:12.5152</td><td><a href="/download/1/software-v5.1.0.tar.gz
syscall: wget -O card.tar.bz2 https://card.mcmaster.ca/download/1/software-v5.1.1.tar.bz2">DOWNLOAD</a></td></tr><div id="other-software" class="more collapse"><tr class="more-software collapse"><td>RGI Software</td><td class="hidden-xs">RGI version 5.1.0 - added beta version for K-mer taxonomic classifiers (rgi kmer_build and rgi kmer_query), updated the code to accept a broader range of nucleotide redundancy codes, and set biopython to version 1.72 due to bug on biopython version 1.74</td><td class="hidden-xs">5.1.0</td><td class="hidden-xs">TAR</td><td class="hidden-xs">2019-08-22 09:43:12.5152</td><td><a href="/download/1/software-v5.1.0.tar.gz
The following command failed with exit code 4
wget -O card.tar.bz2 https://card.mcmaster.ca/download/1/software-v5.1.1.tar.bz2">DOWNLOAD</a></td></tr><div id="other-software" class="more collapse"><tr class="more-software collapse"><td>RGI Software</td><td class="hidden-xs">RGI version 5.1.0 - added beta version for K-mer taxonomic classifiers (rgi kmer_build and rgi kmer_query), updated the code to accept a broader range of nucleotide redundancy codes, and set biopython to version 1.72 due to bug on biopython version 1.74</td><td class="hidden-xs">5.1.0</td><td class="hidden-xs">TAR</td><td class="hidden-xs">2019-08-22 09:43:12.5152</td><td><a href="/download/1/software-v5.1.0.tar.gz

The output was:

--2020-08-12 13:28:20--  https://card.mcmaster.ca/download/1/software-v5.1.1.tar.bz2%3EDOWNLOAD%3C/a%3E%3C/td%3E%3C/tr%3E%3Cdiv%20id=other-software%20class=more
Resolving card.mcmaster.ca (card.mcmaster.ca)... 130.113.77.126
Connecting to card.mcmaster.ca (card.mcmaster.ca)|130.113.77.126|:443... connected.
HTTP request sent, awaiting response... 404 Not Found
2020-08-12 13:28:20 ERROR 404: Not Found.

--2020-08-12 13:28:20--  http://collapse%3E%3Ctr%20class=more-software/
Resolving collapse><tr class=more-software (collapse><tr class=more-software)... failed: Name or service not known.
wget: unable to resolve host address ‘collapse><tr class=more-software’
--2020-08-12 13:28:20--  http://collapse%3E%3Ctd%3Ergi%20software%3C/td%3E%3Ctd%20class=hidden-xs%3ERGI%20version%205.1.0%20-%20added%20beta%20version%20for%20K-mer%20taxonomic%20classifiers%20(rgi%20kmer_build%C2%A0and%C2%A0rgi%20kmer_query),%20updated%20the%20code%20to%20accept%20a%20broader%20range%20of%20nucleotide%20redundancy%20codes,%20and%20set%20biopython%20to%20version%201.72%20due%20to%20bug%20on%20biopython%20version%201.74%3C/td%3E%3Ctd%20class=hidden-xs%3E5.1.0%3C/td%3E%3Ctd%20class=hidden-xs%3ETAR%3C/td%3E%3Ctd%20class=hidden-xs%3E2019-08-22%2009:43:12.5152%3C/td%3E%3Ctd%3E%3Ca%20href=/download/1/software-v5.1.0.tar.gz
Resolving collapse><td>rgi software< (collapse><td>rgi software<)... failed: Name or service not known.
wget: unable to resolve host address ‘collapse><td>rgi software<’

I'll work on a PR to get this fixed.

Thanks! Robert

rpetit3 commented 3 years ago

Dug into this further.

Regex: r'''href="(/download/.*?broad.*?v([0-9]+\.[0-9]+\.[0-9]+)\.tar\.(gz|bz2))"''' is matching to this:

<a href="/download/1/software-v5.1.1.tar.bz2">DOWNLOAD</a></td></tr><div id="other-software" class="more collapse"><tr class="more-software collapse"><td>RGI Software</td><td class="hidden-xs">RGI version 5.1.0 - added beta version for K-mer taxonomic classifiers (rgi kmer_build and rgi kmer_query), updated the code to accept a broader range of nucleotide redundancy codes, and set biopython to version 1.72 due to bug on biopython version 1.74</td><td class="hidden-xs">5.1.0</td><td class="hidden-xs">TAR</td><td class="hidden-xs">2019-08-22 09:43:12.5152</td><td><a href="/download/1/software-v5.1.0.tar.gz">

because: updated the code to accept a broader range of nucleotide

One solution is to change the regex to : r'''href="(/download/0/.*?broad.*?v([0-9]+\.[0-9]+\.[0-9]+)\.tar\.(gz|bz2))"''' Or extend it to r'''href="(/download/.*?broadstreet.*?v([0-9]+\.[0-9]+\.[0-9]+)\.tar\.(gz|bz2))"'''

rpetit3 commented 3 years ago

Related to https://github.com/sanger-pathogens/ariba/pull/302 and https://github.com/sanger-pathogens/ariba/pull/303

raphenya commented 3 years ago

@rpetit3 CARD updated the downloads for both software and data filename extension from .gz to .bz2 around March 2020, that's why the regex failed. I think your two solutions should work.

rpetit3 commented 3 years ago

This is fixed on the Bioconda side. You will need to update Ariba in your Conda environment.

conda update -c conda-forge -c bioconda ariba

You should see something like:

The following packages will be UPDATED:

  ariba                               2.14.5-py36hf0b53f7_1 --> 2.14.5-py36hf0b53f7_2

After update:

ariba getref card card
Getting available CARD versions
Downloading "https://card.mcmaster.ca/download" and saving as "download.html" ... done
Found versions:
1.0.0   https://card.mcmaster.ca/download/0/broadstreet-v1.0.0.tar.bz2
1.0.1   https://card.mcmaster.ca/download/0/broadstreet-v1.0.1.tar.bz2
1.0.2   https://card.mcmaster.ca/download/0/broadstreet-v1.0.2.tar.bz2
1.0.3   https://card.mcmaster.ca/download/0/broadstreet-v1.0.3.tar.bz2
1.0.4   https://card.mcmaster.ca/download/0/broadstreet-v1.0.4.tar.bz2
1.0.5   https://card.mcmaster.ca/download/0/broadstreet-v1.0.5.tar.bz2
1.0.6   https://card.mcmaster.ca/download/0/broadstreet-v1.0.6.tar.bz2
1.0.7   https://card.mcmaster.ca/download/0/broadstreet-v1.0.7.tar.bz2
1.0.8   https://card.mcmaster.ca/download/0/broadstreet-v1.0.8.tar.bz2
1.0.9   https://card.mcmaster.ca/download/0/broadstreet-v1.0.9.tar.bz2
1.1.0   https://card.mcmaster.ca/download/0/broadstreet-v1.1.0.tar.bz2
1.1.1   https://card.mcmaster.ca/download/0/broadstreet-v1.1.1.tar.bz2
1.1.2   https://card.mcmaster.ca/download/0/broadstreet-v1.1.2.tar.bz2
1.1.3   https://card.mcmaster.ca/download/0/broadstreet-v1.1.3.tar.bz2
1.1.4   https://card.mcmaster.ca/download/0/broadstreet-v1.1.4.tar.bz2
1.1.5   https://card.mcmaster.ca/download/0/broadstreet-v1.1.5.tar.bz2
1.1.6   https://card.mcmaster.ca/download/0/broadstreet-v1.1.6.tar.bz2
1.1.7   https://card.mcmaster.ca/download/0/broadstreet-v1.1.7.tar.bz2
1.1.8   https://card.mcmaster.ca/download/0/broadstreet-v1.1.8.tar.bz2
1.1.9   https://card.mcmaster.ca/download/0/broadstreet-v1.1.9.tar.bz2
1.2.0   https://card.mcmaster.ca/download/0/broadstreet-v1.2.0.tar.bz2
1.2.1   https://card.mcmaster.ca/download/0/broadstreet-v1.2.1.tar.bz2
2.0.0   https://card.mcmaster.ca/download/0/broadstreet-v2.0.0.tar.gz
2.0.1   https://card.mcmaster.ca/download/0/broadstreet-v2.0.1.tar.gz
2.0.2   https://card.mcmaster.ca/download/0/broadstreet-v2.0.2.tar.gz
2.0.3   https://card.mcmaster.ca/download/0/broadstreet-v2.0.3.tar.gz
3.0.0   https://card.mcmaster.ca/download/0/broadstreet-v3.0.0.tar.gz
3.0.1   https://card.mcmaster.ca/download/0/broadstreet-v3.0.1.tar.gz
3.0.2   https://card.mcmaster.ca/download/0/broadstreet-v3.0.2.tar.gz
3.0.3   https://card.mcmaster.ca/download/0/broadstreet-v3.0.3.tar.gz
3.0.4   https://card.mcmaster.ca/download/0/broadstreet-v3.0.4.tar.gz
3.0.5   https://card.mcmaster.ca/download/0/broadstreet-v3.0.5.tar.gz
3.0.6   https://card.mcmaster.ca/download/0/broadstreet-v3.0.6.tar.gz
3.0.7   https://card.mcmaster.ca/download/0/broadstreet-v3.0.7.tar.gz
3.0.8   https://card.mcmaster.ca/download/0/broadstreet-v3.0.8.tar.bz2
3.0.9   https://card.mcmaster.ca/download/0/broadstreet-v3.0.9.tar.bz2
3.1.0   https://card.mcmaster.ca/download/0/broadstreet-v3.1.0.tar.bz2
Getting version 3.1.0
Working in temporary directory /local/home/rpetit/temp_blast/card.download
Downloading data from card: https://card.mcmaster.ca/download/0/broadstreet-v3.1.0.tar.bz2
syscall: wget -O card.tar.bz2 https://card.mcmaster.ca/download/0/broadstreet-v3.1.0.tar.bz2
...finished downloading
Extracted json data file ./card.json. Reading its contents...
Found 3022 records in the json file. Analysing...
Extracted data and written ARIBA input files

Finished. Final files are:
        /local/home/rpetit/temp_blast/card.fa
        /local/home/rpetit/temp_blast/card.tsv

You can use them with ARIBA like this:
ariba prepareref -f /local/home/rpetit/temp_blast/card.fa -m /local/home/rpetit/temp_blast/card.tsv output_directory

If you use this downloaded data, please cite:
"The Comprehensive Antibiotic Resistance Database", McArthur et al 2013, PMID: 23650175
and in your methods say that version 3.1.0 of the database was used
puethe commented 3 years ago

Fixed with https://github.com/sanger-pathogens/ariba/pull/302