monarch-initiative / dipper

Data Ingestion Pipeline for Monarch
https://dipper.readthedocs.io/en/latest/
BSD 3-Clause "New" or "Revised" License
57 stars 26 forks source link

Update Source: GO associations #410

Open cmungall opened 7 years ago

cmungall commented 7 years ago
cmungall commented 7 years ago

the GeneOntology.py source downloads a 3.7G idmapping file, for going from uniprot to NCBI/MOD

This almost certainly has too many IDs. There are other sources to get per-species mappings, e.g ftp://ftp.ebi.ac.uk/pub/databases/reference_proteomes/QfO/Eukaryota/

But in general our strategy should be to keep dipper more lightweight, and delay ID resolution?

jmcmurry commented 7 years ago

Needed for R24 Aim 1 here "Convert GO annotations into phenotype annotations and incorporate into Monarch data corpus"

jmcmurry commented 6 years ago

For the R24 we can mention the biolink API which federates a call to GO solr, as well as the GO dipper ingest

jmcmurry commented 6 years ago

@kshefchek to investigate move to ontobio parser. The protein and gene relationships are a little tricky, but for human should be straightforward. cc: @deepakunni3

kshefchek commented 6 years ago

https://github.com/monarch-initiative/dipper/commits/master/dipper/sources/GeneOntology.py

kshefchek commented 6 years ago

Looking at our app, it looks like we have more than just IMP, https://monarchinitiative.org/gene/MGI%3A98297#functions or am I misunderstanding?

@cmungall in your opinion what is the priority on this given the new workflow of biolink driving widgets.

kshefchek commented 6 years ago

Parsers aside, "Convert GO annotations into phenotype annotations and incorporate into Monarch data corpus" is already done, we have >13k has_phenotype relations coming from our GO ingest.