monarch-initiative / monarch-ingest

Data ingest application for Monarch Initiative knowledge graph using Koza
https://monarchinitiative.org
14 stars 1 forks source link

Rescue 112k HPOA d2p edges with ORPHA -> MONDO mapping #434

Closed kevinschaper closed 1 year ago

kevinschaper commented 1 year ago

We have about 112k dangling edges from hpoa_disease_to_phenotype that aren't making it into monarch-kg right now because we don't have ORPHA to MONDO mappings coming from mondo.sssom.tsv

@matentzn Is this a mapping that you're already planning to add?

matentzn commented 1 year ago

Its already in there - the prefix is Orphanet; which pipeline is currently responsible for doing the rewiring? Can you give a 3 step bullet list of what goes in and how rewiring happens and what comes out?

kevinschaper commented 1 year ago

Whoops! I should have tried a grep -i!

It looks like phenotype.hpoa uses ORPHA, the Mondo SSSOM uses Orphanet, and biolink (currently) wants orphanet

@matentzn the rewiring code isn't very sophisticated at all, it's just this bunch of pandas joins: https://github.com/monarch-initiative/cat-merge/blob/main/cat_merge/mapping_utils.py

The process is (for subject in the kgx edge tsv, same repeated for object) :

Totally outside the scope of this issue, but it would be great to update this process so that it didn't rely on the order of subject & object in the SSSOM files - like with our mapping commons we could have our own node normalizer style api.

kevinschaper commented 1 year ago

@matentzn do you have DECIPHER mapping? it looks like there are 296 more d2p edges that I could gain with that.

kevinschaper commented 1 year ago
Screenshot 2023-03-22 at 2 05 55 PM

💪