Closed Maryam-Haghani closed 5 months ago
Thank you for your interest in GOA Tools and for giving your time to write this issue.
Download the annotations before reading using wget:
$ wget http://geneontology.org/ontology/go-basic.obo
$ wget http://current.geneontology.org/annotations/goa_human.gpad.gz
$ gunzip goa_human.gpad.gz
Then read the annotations like this:
from goatools.base import get_godag
from goatools.anno.gpad_reader import GpadReader
godag = get_godag('go-basic.obo')
anno = GpadReader('goa_human.gpad', godag=godag)
I added a new notebook demonstrating how to read annotations from a GPAD file here
Thank you for your helpful answer @dvklopfenstein. As far as I see, you have downloaded the gpad file in your notebook from GO website that is gpa-version: 1.2 and using this version resolves my issue. But problem with my first code using goatools.anno.dnld_ebi_goa.py file is that it downloads the gpa file from _ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/HUMAN/goahuman.gpa.gz, with gpa-version: 1.1. As error details says, GpadReader cannot find some evidence code of this file in eco2group dictionary, and hence it raises an error. I think to get rid of the error, we have to change goatools.anno.dnld_ebi_goa.py file to use version 1.2 of the gpad file.
Thank you very much for the error detail and the name and location of the annotation file. This is extremely helpful.
Likely, we need to update the ECO codes. This should go quickly. I'll have an update ASAP.
Thank you for your time to report the error on the ECO code and thank you for your interest in GOA Tools.
You are welcome! BTW, I used evidenceontology to have the updated ECO codes and have written a code to convert this file to a dictionary automatically. This code can be used in eco2group.py to have the ECO2GRP dictionary automatically rather than hard-coding.
Here is the code:
import pandas as pd
#Use eco file existing in evidenceontology
URL = "https://raw.githubusercontent.com/evidenceontology/evidenceontology/master/gaf-eco-mapping-derived.txt"
# Read file in df format
df_eco = pd.read_csv(URL, comment='#', sep="\t", header = None)
ECO2GRP = dict(df_eco[[0,1]].values)
Very nice.
Thank you so much for the additional information regarding downloading the ECO codes to be used by GOA Tools.
Your idea works well for GOA Tools as our philosophy is to decouple from storing other group's databases, which benefits researchers by giving them the power to use the latest data or to cite a specific data version, which is needed when publishing.
Thank you so much. I will be sure to credit you as a contributor on the next pull request which implements the change you suggest.
Thank you for your time and for your interest in GOA Tools.
And thanks for your valuable work.
Hi, I have noticed in the paper that we can use GPAD format from European Bioinformatics Institute’s FTP site for Gene Ontology Enrichment Analysis instead of NCBI's GAF format:
Based on this, as I have my data in Uniprot format, I am trying to work with the GPAD format to calculate over-represented GOs based on UniprotIDs. But, unfortunately, could not find any examples of that, the way it exists for NCBI gene data. Having a look at the python files in the repository, I came up to the code below which gives an AssertionError while reading the associations at
GpadReader(fin_gpad)
. Could you please help me to resolve this issue?Here is my code:
P.S.: For the last line, I also tried
objgpad = GpadReader(fin_gpad, godag=godag)
, but nothing changed.