merenlab / anvio

An analysis and visualization platform for 'omics data
http://merenlab.org/software/anvio
GNU General Public License v3.0
413 stars 142 forks source link

[BUG] anvi-setup-ncbi-cogs urlopen error #2247

Closed Lcornet closed 3 months ago

Lcornet commented 3 months ago

Short description of the problem

Unable to set up anvi-setup-ncbi-cogs

anvi'o version

Anvi'o .......................................: marie (v8)
Python .......................................: 3.10.14

Profile database .............................: 38
Contigs database .............................: 21
Pan database .................................: 16
Genome data storage ..........................: 7
Auxiliary data storage .......................: 2
Structure database ...........................: 2
Metabolic modules database ...................: 4
tRNA-seq database ............................: 2

System info

No LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 22.04.4 LTS Release: 22.04 Codename: jammy

install of anvio

$ conda create -y --name anvio-8 python=3.10 $ conda activate anvio-8 $ conda install -y -c conda-forge mamba $ mamba install -y -c conda-forge -c bioconda python=3.10 sqlite prodigal idba mcl muscle=3.8.1551 famsa hmmer diamond blast megahit spades bowtie2 bwa graphviz "samtools>=1.9" trimal iqtree trnascan-se fasttree vmatch r-base r-tidyverse r-optparse r-stringi r-magrittr bioconductor-qvalue meme ghostscript $ mamba install -y -c bioconda fastani $ curl -L https://github.com/merenlab/anvio/releases/download/v8/anvio-8.tar.gz --output anvio-8.tar.gz $ pip install anvio-8.tar.gz

Detailed description of the issue

Config Error: Something went wrong with your download attempt. Here is the problem for the url ftp://ftp.ncbi.nlm.nih.gov/pub/COG/COG2020/data/cog-20.cog.csv: '<urlopen error [Errno 110] Connection timed out>'

Files / commands to reproduce the issue

anvi-setup-ncbi-cogs --just-do-it I can access https://ftp.ncbi.nlm.nih.gov/pub/COG/COG2020/data/ from chrome but it seems that i can't from the terminal, as i have a time out. Is there something more to configure in conda ?

Ge0rges commented 3 months ago

Hi @Lcornet NCBI recently changed the URL they use so the v8 of Anvio is out of date. If you install the development version (instructions here) this bug is fixed.

Edit: This is wrong.

ivagljiva commented 3 months ago

Hmm, this actually looks like the new FTP link (the old one used to start with ftp://ftp.ncbi.nih.gov, but the link in the error message has ftp://ftp.ncbi.nlm.nih.gov). Which makes sense because @meren patched the v8 release recently. So perhaps there is still an issue here, but it shouldn't be something that you need to configure yourself. I am looking into it.

ivagljiva commented 3 months ago

I started with a fresh install of anvi'o v8 and ran anvi-setup-ncbi-cogs --debug -T 4. It worked for me (and it used the new NCBI FTP link), so no issues with the codebase or installation instructions here.

@Lcornet, I wonder if there is some sort of issue blocking downloads from the terminal. Are you able to run curl ftp://ftp.ncbi.nlm.nih.gov/pub/COG/COG2020/data/cog-20.cog.csv -I? The output should be the header of the NCBI data page, which looked like this when I ran it:

Last-Modified: Fri, 11 Sep 2020 01:37:40 GMT
Content-Length: 350149451
Accept-ranges: bytes
Lcornet commented 3 months ago

It seems that it comes from my side:

curl ftp://ftp.ncbi.nlm.nih.gov/pub/COG/COG2020/data/cog-20.cog.csv -I curl: (28) Failed to connect to ftp.ncbi.nlm.nih.gov port 21 after 269607 ms: Couldn't connect to server

Maybe the new version of ubuntu don't allow ftp from terminal for some reasons.

meren commented 3 months ago

Hey @Lcornet, it seem to be a problem on your side as the same command works for me as well.

Lcornet commented 3 months ago

Thanks for quick help, firewall problem.