kblin / ncbi-genome-download

Scripts to download genomes from the NCBI FTP servers
Apache License 2.0
914 stars 174 forks source link

Host Link Could be Incorrect #223

Open Mohammed-Quraishi opened 4 months ago

Mohammed-Quraishi commented 4 months ago

I keep getting the following when attempting to download all the gff and cds-fasta files for Pantoea from NCBI:

"ERROR: Download from NCBI failed: ConnectionError(MaxRetryError('HTTPSConnectionPool(host=\'ftp.ncbi.nih.gov\', port=443): Max retries exceeded with url: /genomes/genbank/bacteria/assembly_summary.txt (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x7fd4c69fcfe0>: Failed to resolve \'ftp.ncbi.nih.gov\' ([Errno -3] Temporary failure in name resolution)"))'))"

Take note of the "host=\'ftp.ncbi.nih.gov\'"

Code to recreate error:

ncbi-genome-download bacteria --output-folder inputs/genbank/Pantoea --flat-output --genera Pantoea -F gff,cds-fasta -s genbank --retries 3

I've tried the following to find the issue by pinging the host site as seen below

ping ftp.ncbi.nih.gov

And this outputs "ping: ftp.ncbi.nih.gov: Temporary failure in name resolution"

When I try to paste ftp.ncbi.nih.gov into my web browser, it doesn't work. It works only when I add .nlm after .ncbi (ftp.ncbi.nlm.nih.gov)

Could it be that the program has the wrong link set as host?

Mohammed-Quraishi commented 4 months ago

I just found the -u or --uri option and I tried it with 'https://ftp.ncbi.nlm.nih.gov/genomes' and it worked perfectly!

Could you update the default uri to prevent other users from having this issue by default?

kblin commented 3 weeks ago

While the other DNS domain works fine for me, this obviously makes sense and will be shipped with the next release. Thanks for your report!