[Question] How to parse .json, .obo, or .owl to get dictionary of enzymes {id_go:{ec_1, ec_2, ..., ec_n}}

tanghaibao / goatools

Python library to handle Gene Ontology (GO) terms

BSD 2-Clause "Simplified" License

783 stars 210 forks source link

I'm trying to understand how I can use GOTATOOLS to parse any of the GO files to yield a dictionary that has the following structure:

{id_go: {ec_1, ec_2, ..., ec_n}}

I was able to load the obo file but I couldn't figure out how to get the enzymes:


from goatools.base import get_godag

godag = get_godag('Databases/GO/go-basic.obo', optional_attrs='relationship')
go = godag['GO:0000015']

for id_go, go in godag.items():

    print(id_go, go.get_all_children())
#GO:0000001 set()
#GO:0000002 set()
#GO:0000006 set()
#GO:0000007 set()
#GO:0000009 {'GO:0033164', 'GO:0052917'}

They are definitely in there, I just don't how to access them:

%%bash
grep -c "EC:" /Users/jolespin/Databases/GO/go-basic.obo

# Databases/GO/go-basic.obo:26098

from goatools.base import get_godag godag = get_godag("go-basic.obo", optional_attrs="xref") for id_go, go in godag.items(): ecs = [x for x in go.xref if x.startswith("EC:")] if ecs: print(id_go, ecs)

tanghaibao / goatools

[Question] How to parse .json, .obo, or .owl to get dictionary of enzymes {id_go:{ec_1, ec_2, ..., ec_n}} #292