Update Source: GO associations

monarch-initiative / dipper

Data Ingestion Pipeline for Monarch

https://dipper.readthedocs.io/en/latest/

BSD 3-Clause "New" or "Revised" License

57 stars 26 forks source link

Update Source: GO associations #410

Open cmungall opened 7 years ago

cmungall commented 7 years ago

[ ] better document GeneOntology.py
- e.g. IMP annotations generate an additional phenotype annotation
- other annotations go in with one of 3 RO types
[ ] fix GOA sources to use new download URLs
[ ] add cypher queries / yamls (see #409) - https://github.com/monarch-initiative/monarch-cypher-queries/pull/5
[ ] add monarch-app configs

cmungall commented 7 years ago

the GeneOntology.py source downloads a 3.7G idmapping file, for going from uniprot to NCBI/MOD

This almost certainly has too many IDs. There are other sources to get per-species mappings, e.g ftp://ftp.ebi.ac.uk/pub/databases/reference_proteomes/QfO/Eukaryota/

But in general our strategy should be to keep dipper more lightweight, and delay ID resolution?

jmcmurry commented 7 years ago

Needed for R24 Aim 1 here "Convert GO annotations into phenotype annotations and incorporate into Monarch data corpus"

jmcmurry commented 6 years ago

For the R24 we can mention the biolink API which federates a call to GO solr, as well as the GO dipper ingest

jmcmurry commented 6 years ago

@kshefchek to investigate move to ontobio parser. The protein and gene relationships are a little tricky, but for human should be straightforward. cc: @deepakunni3

kshefchek commented 6 years ago

https://github.com/monarch-initiative/dipper/commits/master/dipper/sources/GeneOntology.py

kshefchek commented 6 years ago

Looking at our app, it looks like we have more than just IMP, https://monarchinitiative.org/gene/MGI%3A98297#functions or am I misunderstanding?

@cmungall in your opinion what is the priority on this given the new workflow of biolink driving widgets.

kshefchek commented 6 years ago

Parsers aside, "Convert GO annotations into phenotype annotations and incorporate into Monarch data corpus" is already done, we have >13k has_phenotype relations coming from our GO ingest.