Closed kevinschaper closed 1 year ago
Its already in there - the prefix is Orphanet
; which pipeline is currently responsible for doing the rewiring? Can you give a 3 step bullet list of what goes in and how rewiring happens and what comes out?
Whoops! I should have tried a grep -i
!
It looks like phenotype.hpoa uses ORPHA
, the Mondo SSSOM uses Orphanet
, and biolink (currently) wants orphanet
@matentzn the rewiring code isn't very sophisticated at all, it's just this bunch of pandas joins: https://github.com/monarch-initiative/cat-merge/blob/main/cat_merge/mapping_utils.py
The process is (for subject
in the kgx edge tsv, same repeated for object
) :
where kgx.original_subject = sssom.object_id
to populate subject
in the kgx original_subject
if it still matchessubject
(no mapping happened)Totally outside the scope of this issue, but it would be great to update this process so that it didn't rely on the order of subject & object in the SSSOM files - like with our mapping commons we could have our own node normalizer style api.
@matentzn do you have DECIPHER mapping? it looks like there are 296 more d2p edges that I could gain with that.
💪
We have about 112k dangling edges from hpoa_disease_to_phenotype that aren't making it into monarch-kg right now because we don't have ORPHA to MONDO mappings coming from mondo.sssom.tsv
@matentzn Is this a mapping that you're already planning to add?