Open cmungall opened 7 years ago
the GeneOntology.py source downloads a 3.7G idmapping file, for going from uniprot to NCBI/MOD
This almost certainly has too many IDs. There are other sources to get per-species mappings, e.g ftp://ftp.ebi.ac.uk/pub/databases/reference_proteomes/QfO/Eukaryota/
But in general our strategy should be to keep dipper more lightweight, and delay ID resolution?
Needed for R24 Aim 1 here "Convert GO annotations into phenotype annotations and incorporate into Monarch data corpus"
For the R24 we can mention the biolink API which federates a call to GO solr, as well as the GO dipper ingest
@kshefchek to investigate move to ontobio parser. The protein and gene relationships are a little tricky, but for human should be straightforward. cc: @deepakunni3
Looking at our app, it looks like we have more than just IMP, https://monarchinitiative.org/gene/MGI%3A98297#functions or am I misunderstanding?
@cmungall in your opinion what is the priority on this given the new workflow of biolink driving widgets.
Parsers aside, "Convert GO annotations into phenotype annotations and incorporate into Monarch data corpus" is already done, we have >13k has_phenotype relations coming from our GO ingest.