Closed kevinschaper closed 1 year ago
@cmungall suggested using oak to get mappings from node normalizer:
✗ runoak -i translator: mappings CHEBI:30769 -O sssom
# curie_map: {}
# license: UNSPECIFIED
# mapping_set_id: temp
subject_id subject_label predicate_id object_id object_label
mapping_justification subject_source object_source
CHEBI:30769 citric acid skos:exactMatch PUBCHEM.COMPOUND:311 Citric Acid
semapv:ManualMappingCuration CHEBI PUBCHEM.COMPOUND
CHEBI:30769 citric acid skos:exactMatch CHEMBL.COMPOUND:CHEMBL1261 CITRIC
ACID semapv:ManualMappingCuration CHEBI CHEMBL.COMPOUND
CHEBI:30769 citric acid skos:exactMatch UNII:XF417D3PSL ANHYDROUS CITRIC
ACID semapv:ManualMappingCuration CHEBI UNII
CHEBI:30769 citric acid skos:exactMatch CHEBI:30769 citric acid
semapv:ManualMappingCuration CHEBI CHEBI
CHEBI:30769 citric acid skos:exactMatch DRUGBANK:DB04272
semapv:ManualMappingCuration CHEBI DRUGBANK
CHEBI:30769 citric acid skos:exactMatch MESH:D019343 Citric Acid
semapv:ManualMappingCuration CHEBI MESH
CHEBI:30769 citric acid skos:exactMatch CAS:153301-06-5
semapv:ManualMappingCuration CHEBI CAS
CHEBI:30769 citric acid skos:exactMatch CAS:77-92-9
semapv:ManualMappingCuration CHEBI CAS
CHEBI:30769 citric acid skos:exactMatch DrugCentral:666 citric acid
semapv:ManualMappingCuration CHEBI DrugCentral
CHEBI:30769 citric acid skos:exactMatch GTOPDB:2478 citric acid
semapv:ManualMappingCuration CHEBI GTOPDB
CHEBI:30769 citric acid skos:exactMatch HMDB:HMDB0000094 Citric acid
semapv:ManualMappingCuration CHEBI HMDB
CHEBI:30769 citric acid skos:exactMatch KEGG.COMPOUND:C00158 Citrate
semapv:ManualMappingCuration CHEBI KEGG.COMPOUND
CHEBI:30769 citric acid skos:exactMatch
INCHIKEY:KRKNYBCHXYNGOX-UHFFFAOYSA-N semapv:ManualMappingCuration CHEBI
INCHIKEY
CHEBI:30769 citric acid skos:exactMatch UMLS:C0055819 citric acid
semapv:ManualMappingCuration CHEBI UMLS
CHEBI:30769 citric acid skos:exactMatch UMLS:C0725616 citric acid,
anhydrous semapv:ManualMappingCuration CHEBI UMLS
CHEBI:30769 citric acid skos:exactMatch UMLS:C4718949 Citric acid
monoglyceride semapv:ManualMappingCuration CHEBI UMLS
@matentzn suggested using biomappings based on some additional care involved in making the mappings (they're generated specifically for mapping with extra caution, avoids mega-cliques). Rather than just download / filter (and maybe flip) - we're going to meet during the mondo technical time slot next week to look at how we can formally gather the mappings up into a monarch mapping commons.
Looking at the file, I see that the rows we want look like:
mesh:D065366 Cryptoxanthins skos:exactMatch chebi:10362 beta-cryptoxanthin semapv:LexicalMatching 0.95 https://github.com/biomappings/biomappings/blob/a80ed2/scripts/import_gilda_mappings.py
To make this cat-merge & monarch-kg friendly, first we'll want to subset to just the rows with mesh & chebi IDs and only skos:exactMatch, then we'll want to flip so that it's chebi in the subject, then we'll want to capitalize MESH & CHEBI to match our prefix preferences.
resulting in:
CHEBI:10362 Cryptoxanthins skos:exactMatch MESH:D065366 beta-cryptoxanthin semapv:LexicalMatching 0.95 https://github.com/biomappings/biomappings/blob/a80ed2/scripts/import_gilda_mappings.py
(though, I'm not sure how we tackle provenance in the last column now that we're modifying the original)
We should talk about how to do this more systematically moving forward! I have some ideas, send you a slack.
@kevinschaper did you want to leave this open while we figure out the new mapping process? or close it now that the mapping file is in m-ingest
Closing, as it seems like we're already using this in monarch ingest in the merge step