monarch-initiative / monarch-ingest

Data ingest application for Monarch Initiative knowledge graph using Koza
https://monarchinitiative.org
14 stars 1 forks source link

GO ingest #64

Closed putmantime closed 2 years ago

putmantime commented 3 years ago

The GO defines concepts/classes used to describe gene function, and relationships between these concepts.

How do we use it? Monarch processes gene-process/function/subcellular/location associations. Download RDF

Ingested files: ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/idmapping/idmapping_selected.tab.gz retrieved on 2021-03-09 https://archive.monarchinitiative.org/DipperCache/go/gaf-eco-mapping.yaml http://current.geneontology.org/annotations/sgd.gaf.gz http://current.geneontology.org/annotations/goa_dog.gaf.gz http://current.geneontology.org/annotations/goa_cow.gaf.gz http://current.geneontology.org/annotations/goa_pig.gaf.gz http://current.geneontology.org/annotations/wb.gaf.gz http://current.geneontology.org/annotations/pombase.gaf.gz http://current.geneontology.org/annotations/rgd.gaf.gz http://current.geneontology.org/annotations/zfin.gaf.gz http://current.geneontology.org/annotations/goa_chicken.gaf.gz http://current.geneontology.org/annotations/aspgd.gaf.gz http://current.geneontology.org/annotations/goa_human.gaf.gz http://current.geneontology.org/annotations/fb.gaf.gz http://current.geneontology.org/annotations/dictybase.gaf.gz http://current.geneontology.org/annotations/mgi.gaf.gz Monarch Ingest Date: 2021-03-09 License information from the (Re)usable Data Project

RichardBruskiewich commented 2 years ago

Questions to ponder for this ingest:

  1. Species coverage: are all the above GAF files to be ingested (the more the merrier)?
  2. See draft GOA doc for educated guesses on Biolink Model mappings. In particular, note that the applicable biolink:Association classes all have parent subject Biolink category biolink:MacromolecularMachine, not biolink:Gene.
  3. Initial guesses on applicable predicates are given. The choices of predicate for biolink:MacromolecularMachineToBiologicalProcessAssociation and biolink:MacromolecularMachineToCellularComponentAssociation seem OK. However, I'm a bit unsure about what predicate to use for biolink:MacromolecularMachineToMolecularActivityAssociation. Any suggestions?
cmungall commented 2 years ago

I think we need a canonical Monarch reference species list, and use the same set of species everywhere (orthology, phenotypes)

I think the criteria for inclusion should be: do we have any phenotypes for any of the genes in the species?

If we agree on that criteria then we should drop aspergillus, or do we have g2p for aspgd?

RichardBruskiewich commented 2 years ago

Resolved by Monarch Ingest PR #https://github.com/monarch-initiative/monarch-ingest/pull/152