Closed JanZrimec closed 3 years ago
Thank you so much for your interest in GOA TOOLS and for taking your time to open this issue.
The file that you downloaded is a GAF 2.0 file and is expected to have 17 fields. Instead it has 16 fields. It is missing the field, Gene Product Form ID.
I can change the code to read this incorrect format, however it might be better to do either:
Please let me know what might work for your situation.
FYI: I also opened an issue with the Gene Ontology Consortium letting them know that we saw this and I created a test for it:
@JanZrimec , we have heard back from a researcher at the Gene Ontology Consortium regarding the incorrectly formatted GAF file.
They advise using the official GO product at http://current.geneontology.org/annotations/sgd.gaf.gz, which has been processed by GO and not only has the expected number of fields, but also has additional yeast annotations including those from the PAINT pipeline.
Also: For best results, please use the files found at http://current.geneontology.org.
Thanks! This resolved the issue and was really helpful!
Hey I cannot load a gaf file saccharomyces genome database (ver 2.0, created 2018) with GafReader in jupyter notebook. Seems like the file is in a different format that what is trying to be read, with at least one missing column. Are there some settings or workarounds that would enable loading this file? Thanks!
Code: wget.download('http://downloads.yeastgenome.org/curation/literature/gene_association.sgd.gaf.gz') !gunzip gene_association.sgd.gaf.gz from goatools.anno.gaf_reader import GafReader objanno_sc = GafReader('gene_association.sgd.gaf')
Error meassage: BAD Extension( )
0) REQ DB SGD 1) REQ DB_ID S000001503 2) REQ DB_Symbol SPT23 3) Qualifier
4) REQ GO_ID GO:0003674 5) REQ DB_Reference GO_REF:0000015 6) REQ Evidence_Code ND 7) With_From
8) REQ NS F 9) DB_Name ER membrane protein involved in regulation of OLE1 transcription 10) DB_Synonym YKL020C 11) REQ DB_Type protein 12) REQ Taxon taxon:559292 13) REQ Date 20181102 14) REQ Assigned_By SGD 15) Extension
Traceback (most recent call last): File "/home/zrimec/miniconda3/envs/py36/lib/python3.6/site-packages/goatools/anno/init/reader_gaf.py", line 88, in _read_gaf_nts self._add_data0(nts, lnum, line, get_all_nss, namespaces, datobj) File "/home/zrimec/miniconda3/envs/py36/lib/python3.6/site-packages/goatools/anno/init/reader_gaf.py", line 108, in _add_data0 gafvals = datobj.get_gafvals(flds, nspc) File "/home/zrimec/miniconda3/envs/py36/lib/python3.6/site-packages/goatools/anno/init/reader_gaf.py", line 231, in get_gafvals flds[16] = self._get_set(flds[16].rstrip()) IndexError: list index out of range
**FATAL-gaf: list index out of range
**FATAL-gaf: gene_association.sgd.gaf[8]: SGD S000001503 SPT23 GO:0003674 GO_REF:0000015 ND F ER membrane protein involved in regulation of OLE1 transcription YKL020C protein taxon:559292 20181102 SGD An exception has occurred, use %tb to see the full traceback.
SystemExit: 1