monarch-initiative / monarch-ingest

Data ingest application for Monarch Initiative knowledge graph using Koza
https://monarchinitiative.org
14 stars 1 forks source link

Add CHEBI to MESH mapping for CTD #417

Closed kevinschaper closed 1 year ago

kevinschaper commented 1 year ago
kevinschaper commented 1 year ago

@cmungall suggested using oak to get mappings from node normalizer:

✗ runoak -i translator: mappings CHEBI:30769 -O sssom
# curie_map: {}
# license: UNSPECIFIED
# mapping_set_id: temp
subject_id subject_label predicate_id object_id object_label
mapping_justification subject_source object_source
CHEBI:30769 citric acid skos:exactMatch PUBCHEM.COMPOUND:311 Citric Acid
semapv:ManualMappingCuration CHEBI PUBCHEM.COMPOUND
CHEBI:30769 citric acid skos:exactMatch CHEMBL.COMPOUND:CHEMBL1261 CITRIC
ACID semapv:ManualMappingCuration CHEBI CHEMBL.COMPOUND
CHEBI:30769 citric acid skos:exactMatch UNII:XF417D3PSL ANHYDROUS CITRIC
ACID semapv:ManualMappingCuration CHEBI UNII
CHEBI:30769 citric acid skos:exactMatch CHEBI:30769 citric acid
semapv:ManualMappingCuration CHEBI CHEBI
CHEBI:30769 citric acid skos:exactMatch DRUGBANK:DB04272
semapv:ManualMappingCuration CHEBI DRUGBANK
CHEBI:30769 citric acid skos:exactMatch MESH:D019343 Citric Acid
semapv:ManualMappingCuration CHEBI MESH
CHEBI:30769 citric acid skos:exactMatch CAS:153301-06-5
semapv:ManualMappingCuration CHEBI CAS
CHEBI:30769 citric acid skos:exactMatch CAS:77-92-9
semapv:ManualMappingCuration CHEBI CAS
CHEBI:30769 citric acid skos:exactMatch DrugCentral:666 citric acid
semapv:ManualMappingCuration CHEBI DrugCentral
CHEBI:30769 citric acid skos:exactMatch GTOPDB:2478 citric acid
semapv:ManualMappingCuration CHEBI GTOPDB
CHEBI:30769 citric acid skos:exactMatch HMDB:HMDB0000094 Citric acid
semapv:ManualMappingCuration CHEBI HMDB
CHEBI:30769 citric acid skos:exactMatch KEGG.COMPOUND:C00158 Citrate
semapv:ManualMappingCuration CHEBI KEGG.COMPOUND
CHEBI:30769 citric acid skos:exactMatch
INCHIKEY:KRKNYBCHXYNGOX-UHFFFAOYSA-N semapv:ManualMappingCuration CHEBI
INCHIKEY
CHEBI:30769 citric acid skos:exactMatch UMLS:C0055819 citric acid
semapv:ManualMappingCuration CHEBI UMLS
CHEBI:30769 citric acid skos:exactMatch UMLS:C0725616 citric acid,
anhydrous semapv:ManualMappingCuration CHEBI UMLS
CHEBI:30769 citric acid skos:exactMatch UMLS:C4718949 Citric acid
monoglyceride semapv:ManualMappingCuration CHEBI UMLS
kevinschaper commented 1 year ago

@matentzn suggested using biomappings based on some additional care involved in making the mappings (they're generated specifically for mapping with extra caution, avoids mega-cliques). Rather than just download / filter (and maybe flip) - we're going to meet during the mondo technical time slot next week to look at how we can formally gather the mappings up into a monarch mapping commons.

kevinschaper commented 1 year ago

Looking at the file, I see that the rows we want look like:

mesh:D065366    Cryptoxanthins  skos:exactMatch chebi:10362 beta-cryptoxanthin  semapv:LexicalMatching      0.95    https://github.com/biomappings/biomappings/blob/a80ed2/scripts/import_gilda_mappings.py

To make this cat-merge & monarch-kg friendly, first we'll want to subset to just the rows with mesh & chebi IDs and only skos:exactMatch, then we'll want to flip so that it's chebi in the subject, then we'll want to capitalize MESH & CHEBI to match our prefix preferences.

resulting in:

CHEBI:10362 Cryptoxanthins  skos:exactMatch MESH:D065366    beta-cryptoxanthin  semapv:LexicalMatching      0.95    https://github.com/biomappings/biomappings/blob/a80ed2/scripts/import_gilda_mappings.py

(though, I'm not sure how we tackle provenance in the last column now that we're modifying the original)

matentzn commented 1 year ago

We should talk about how to do this more systematically moving forward! I have some ideas, send you a slack.

glass-ships commented 1 year ago

@kevinschaper did you want to leave this open while we figure out the new mapping process? or close it now that the mapping file is in m-ingest

glass-ships commented 1 year ago

Closing, as it seems like we're already using this in monarch ingest in the merge step