download_ncbi_associations() fails while decompressing file #230

msbentsen commented


Thank you for this great package! It has worked for me in the past, but lately I get an error when trying to download the NCBI associations as seen here:

from goatools.base import download_ncbi_associations
file_gene2go = download_ncbi_associations()

This produces the error:

FTP RETR ftp.ncbi.nlm.nih.gov gene/DATA gene2go.gz -> gene2go.gz
  gunzip gene2go.gz

It seems to be correctly downloading the .gz file, but reading it fails, and so the gene2go-file is empty: image

If I use an old gene2go file, it works perfectly (I have one from 10.11.2020 which works), but it seems that any new download fails.

I am running python==3.7.6 and goatools==1.1.6 on a Debian system.

Thank you for any help you might be able to provide for solving this!

dvklopfenstein commented

Thank you for using GOA TOOLs in your day-to-day work and for taking your time to write us.

I have augmented the test, tests/test_i147_all_taxids.py so that it always downloads NCBI's gene2go annotation file for better testing, but am not able to duplicate what you are seeing. So we need more information.

In the meantime, here are a couple things to try:

1. Include the full name of the gene2go file you are downloading; here is an example:

from os import getcwd
from os.path import join
from goatools.base import download_ncbi_associations

fin_anno = join(getcwd(), 'gene2go')

2. Download the gene2go file by hand

$ wget ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/gene2go.gz
$ gunzip gene2go.gz
msbentsen commented

Hi, thank you for getting back to me. I tried the second option, and I think it might be a system-specific issue on my end. I get an "invalid compressed data--format violated" error from gunzip, but I was able to download it from https://ftp.ncbi.nlm.nih.gov/gene/DATA/gene2go.gz and unzip without issue. So probably something to do with restrictions on downloading from ftp - not quite sure. But my problem was solved, thank you!