tanghaibao / goatools

Python library to handle Gene Ontology (GO) terms
BSD 2-Clause "Simplified" License
773 stars 211 forks source link

Allow arbitrary letters in Aspect column #228

Closed serenalotreck closed 2 years ago

serenalotreck commented 2 years ago

I'm trying to use GafReader to read in GAF files for the Planteome database. These files are GAF 2.0 compliant with one exception -- in the "Aspect" column (NS in the GafReader.associations named tuple), there are letters besides the P, F, and C that are specified in the GAF docs.

As a result, when I run the tool, I get the following error:

Traceback (most recent call last):
  File "/mnt/home/lotrecks/anaconda3/envs/dygiepp/lib/python3.7/site-packages/goatools/anno/init/reader_gaf.py", line 92, in _read_gaf_nts
    self._add_data0(nts, lnum, line, get_all_nss, namespaces, datobj)
  File "/mnt/home/lotrecks/anaconda3/envs/dygiepp/lib/python3.7/site-packages/goatools/anno/init/reader_gaf.py", line 114, in _add_data0
    nspc = GafData.aspect2ns[flds[8]]  # 8 GAF Aspect -> BP, MF, or CC
KeyError: 'T'

  **FATAL-gaf: 'T'

**FATAL-gaf: /mnt/scratch/lotrecks/planteome_attempt1/to_gene_Oryza_Gramene.assoc[3]:
GR_gene GR:0060141  CL      TO:0000089  GR_REF:1793 IMP     T   CLUSTERED SPIKELETS Cl|Clustered spikelets|Cl|Clustered spikelets   gene    taxon:4530  20121108    Gramene     

 0) REQ DB                   GR_gene
 1) REQ DB_ID                GR:0060141
 2) REQ DB_Symbol            CL
 3)     Qualifier            
 4) REQ GO_ID                TO:0000089
 5) REQ DB_Reference         GR_REF:1793
 6) REQ Evidence_Code        IMP
 7)     With_From            
 8) REQ NS                   T
 9)     DB_Name              CLUSTERED SPIKELETS
10)     DB_Synonym           Cl|Clustered spikelets|Cl|Clustered spikelets
11) REQ DB_Type              gene
12) REQ Taxon                taxon:4530
13) REQ Date                 20121108
14) REQ Assigned_By          Gramene
15)     Extension            
16)     Gene_Product_Form_ID 

Would it be possible to allow the NS field to have any letter? I'm trying to avoid writing my own GAF parser since you already have what seems to be a fairly robust one, but I'm not sure how else to get around this issue.

Thanks!

dvklopfenstein commented 2 years ago

Thanks for the terrific contribution.

I added functionality for an issue similar to yours to the obo reader for issue 202 (accepting ontologies like the Human Phenotype Ontology (HPO))

I exposed some annotation file issues exposed upon running the regression tests prior to pushing ( https://github.com/geneontology/helpdesk/issues/358 and https://github.com/geneontology/helpdesk/issues/359)

After I resolve the test issues, I will push the new functionality for issue 202.

Thank you again for your interest in GOA TOOLs, for taking the time to write us, and for the terrific code contribution.

serenalotreck commented 2 years ago

Thanks so much for the quick response & merge!