Closed WinterFor closed 3 years ago
I had similar issue. Try deleting the GO and annotation files and run the script again. It might be the case that they were not correctly downloaded due to server/internet problems.
I had similar issue. Try deleting the GO and annotation files and run the script again. It might be the case that they were not correctly downloaded due to server/internet problems.
@enushi
Thanks for your advice, But It does not work for me. I deleted go-basic.obo
and gene2go
, and run the script again. It shows Error -3 while decompressing data: invalid stored block lengths
.
requests.get(http://purl.obolibrary.org/obo/go/go-basic.obo, stream=True)
WROTE: go-basic.obo
go-basic.obo: fmt(1.2) rel(2021-06-16) 47,230 GO Terms
FTP RETR ftp.ncbi.nlm.nih.gov gene/DATA gene2go.gz -> gene2go.gz
gunzip gene2go.gz
---------------------------------------------------------------------------
error Traceback (most recent call last)
<ipython-input-11-1ae2b6ad2b5a> in <module>
----> 1 goeaobj = get_goeaobj('fdr_bh')
2 geneids_study = geneid2symbol.keys()
3 goea_results_all = goeaobj.run_study(geneids_study)
4 goea_results_sig = [r for r in goea_results_all if r.p_fdr_bh < 0.05]
<ipython-input-8-347669f81ffd> in get_goeaobj(method)
10 obodag = GODag("go-basic.obo")
11 # Load Associations
---> 12 download_ncbi_associations() # Get ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/gene2go.gz
13 # Read NCBI's gene2go. Store annotations in a list of namedtuples
14 objanno = Gene2GoReader("gene2go", taxids=[9541])
/home/user/miniconda3/lib/python3.7/site-packages/goatools/base.py in download_ncbi_associations(gene2go, prt, loading_bar)
131 file_remote = "ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/{GZ}".format(
132 GZ=os.path.basename(gzip_file))
--> 133 dnld_file(file_remote, gene2go, prt, loading_bar)
134 else:
135 if prt is not None:
/home/user/miniconda3/lib/python3.7/site-packages/goatools/base.py in dnld_file(src_ftp, dst_file, prt, loading_bar)
221 if prt is not None:
222 prt.write(" gunzip {FILE}\n".format(FILE=dst_wget))
--> 223 gzip_open_to(dst_wget, dst_file)
224 except IOError as errmsg:
225 import traceback
/home/user/miniconda3/lib/python3.7/site-packages/goatools/base.py in gzip_open_to(fin_gz, fout)
233 with gzip.open(fin_gz, 'rb') as zstrm:
234 with open(fout, 'wb') as ostrm:
--> 235 ostrm.write(zstrm.read())
236 assert os.path.isfile(fout), "COULD NOT GUNZIP({G}) TO FILE({F})".format(G=fin_gz, F=fout)
237 os.remove(fin_gz)
/home/user/miniconda3/lib/python3.7/gzip.py in read(self, size)
285 import errno
286 raise OSError(errno.EBADF, "read() on write-only GzipFile object")
--> 287 return self._buffer.read(size)
288
289 def read1(self, size=-1):
/home/user/miniconda3/lib/python3.7/gzip.py in read(self, size)
480 buf = self._fp.read(io.DEFAULT_BUFFER_SIZE)
481
--> 482 uncompress = self._decompressor.decompress(buf, size)
483 if self._decompressor.unconsumed_tail != b"":
484 self._fp.prepend(self._decompressor.unconsumed_tail)
error: Error -3 while decompressing data: invalid stored block lengths
Then I try to manual download gene2go from NCBI download page. I run the script again, It shows AssertionError: **FATAL: NO TAXIDS: gene2go
again.
Hello @WinterFor,
Thank you for interest in GOA TOOLs and for taking the time to write to us. Thank you @enushi for contributing to helping in this issue, which is very much appreciated.
CONCLUSION: crab-eating macaque monkey (9541) is not in NCBI's gene2go annotation file.
ACTION ITEM: @WinterFor, can you find another potential source for the macaque monkey annotations? NCBI does not support taxid 9541 in the their gene2go annotations. When you find the annotations, they will likely not be in gene2go format, but may be in the gaf or gpad format. If the monkey annotations are not in any of those annotations, you can write a short script to convert into either our simple format or the gaf or gpad format. Please open a new issue if you need guidance about annotation conversions.
I download the latest gene2go file:
$ rm -f gene2go*
$ wget ftp://ftp.ncbi.nih.gov/gene/DATA/gene2go.gz
$ gunzip gene2go.gz
The first column contains the taxid. I see that there are 42 taxids (the 43 returned by the awk includes the taxid header):
$ awk '{print $1}' gene2go | sort | uniq -c | sort -r | wc -l
43
The top three taxid with the most annotations is: mouse (10090) with 414,017 annotations; rat(10116) with 389,923 annotations, and human(9606) with 341,883 annotations :
$ awk '{print $1}' gene2go | sort | uniq -c | sort -r
414017 10090
389023 10116
341882 9606
...
Crab-eating macaque monkey (9541) is not in NCBI's gene2go annotation file as shown by the 1st awk command returning nothing. To test that I am running the command correctly, in the 2nd awk, I look for human (9606) and see the expected 341,883 annotations.
$ awk '{print $1}' gene2go | sort | uniq -c | grep 9541
$ awk '{print $1}' gene2go | sort | uniq -c | grep 9606
341882 9606
@dvklopfenstein awesome analysis. might be worth performing a check in code and display the appropriate error when taxid is not found
I will try to find the macaque monkey annotations. Thanks for your help! @dvklopfenstein @enushi
@tanghaibao, I was thinking the same thing regarding improving the error message...
Hi, I am using GOATOOLS for GOEA, and follow the pipline from goea_nbt3102_group_results.ipynb.
But I had some wrong with using
taxids = [9541]
inobjanno = Gene2GoReader("gene2go", taxids=[9541])
. I had tried9606
,10090
, and they can work.9541
is the taxid of crab-eating monkey(Macaca Fascicularis)It shows below
Thanks for your help!