monarch-initiative / monarch-ingest

Data ingest application for Monarch Initiative knowledge graph using Koza
https://monarchinitiative.org
15 stars 1 forks source link

Anatomy edges should be mapped to Uberon #399

Open kevinschaper opened 1 year ago

kevinschaper commented 1 year ago

Nico noted in phenio #33 that we should be be mapping anatomy terms to Uberon rather than using the species specific ontologies.

It looks like we're already getting Uberon terms only from bGee, but we'll want the rows that come in from the alliance expression load to be mapped, though we'll do that downstream by providing SSSOM to cat-merge.

@matentzn Is there a SSSOM file for mapping from FBbt, EMAPA, WBbt and ZFA to Uberon?

matentzn commented 1 year ago

I dont want to make that decison with Uberon vs SSOAs - hasn't there been a formal discussion about this? It is important that @cmungall signs off on a huge decision like this. From my perspective, we only support species specific phenotypes, everything else should be mapped to a canonical identifier scheme/ontology, but I am not involved enough to know the state of this discussion..

No sssom file for Uberon mappings yet, but you can request one from the Uberon tracker.

kevinschaper commented 1 year ago

Oops, I think I might have misunderstood your comment!

matentzn commented 1 year ago

No I just notices @caufieldjh removing SSAO (species specific anatomy ontologies) from PHENIO. Either we:

  1. Include all SSAOs in PHENIO (no changes to anatomy ingests) or
  2. Include no SSAOs in PHENIO, and map all annotations to these up to Uberon using SSSOM files

I think 2 is probably more sane, but I am just saying that I personally do not want to make that decision.

caufieldjh commented 1 year ago

FYI I haven't removed any of the SSAOs from PHENIO (yet) - in the case of MA it just isn't adding much, but for something like ZFA we should reach a conclusion on source of truth

matentzn commented 1 year ago

Why is MA not adding much? All of the IMPC, MPD, MGI annotations (or most of them) are to MA..

kevinschaper commented 1 year ago

Right now the only two ingests we have that bring in anatomy terms are bGee and Alliance gene expression.

BGee is already using Uberon only:

cut -f 4 bgee_gene_to_expression_edges.tsv | grep -v object | cut -d: -f 1 | sort | uniq -c
51176 CL
451571 UBERON

From Alliance we're getting EMAPA terms

cut -f 4 alliance_gene_to_expression_edges.tsv | grep -v object | cut -d: -f 1 | sort | uniq -c
742971 EMAPA
394955 FBbt
40239 GO
102947 WBbt
515074 ZFA
matentzn commented 1 year ago

its so weird you get EMAPA ids from alliance and not MA.. But ok, it is possible! In any case, this should be run by Chris to be totally sure.