related-sciences / nxontology-data

NXOntology data: making ontologies accessible as simple JSON files
Other
10 stars 3 forks source link

Extract MeSH mappings to external registries / vocabularies #11

Open dhimmel opened 1 year ago

dhimmel commented 1 year ago

MeSH includes some external mappings via the following predicates (from docs):

Here's a query to access these:

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX meshv: <http://id.nlm.nih.gov/mesh/vocab#>
SELECT DISTINCT *
FROM <http://id.nlm.nih.gov/mesh>
WHERE { 
  ?concept_uri rdf:type meshv:Concept.
  ?concept_uri rdfs:label ?concept_label.
  ?concept_uri meshv:identifier ?concept_id.
  VALUES ?predicate_uri {
    meshv:registryNumber
    meshv:relatedRegistryNumber
    meshv:casn1_label
  }
  ?concept_uri ?predicate_uri ?registry_number.
  BIND( STRAFTER(STR(?predicate_uri), "mesh/vocab#") AS ?relationship_type )
  FILTER (?registry_number != "0")
}
ORDER BY ?concept_uri ?predicate_uri ?registry_number
concept_uri concept_label concept_id predicate_uri registry_number relationship_type
mesh:M0000001 Calcimycin M0000001 meshv:casn1_label 4-Benzoxazolecarboxylic acid, 5-(methylamino)-2-((3,9,11-trimethyl-8-(1-methyl-2-oxo-2-(1H-pyrrol-2-yl)ethyl)-1,7-dioxaspiro(5.5)undec-2-yl)methyl)-, (6S-(6alpha(2S,3S),8beta(R*),9beta,11alpha))- casn1_label
mesh:M0000001 Calcimycin M0000001 meshv:registryNumber 37H9VM9WZL registryNumber
mesh:M0000001 Calcimycin M0000001 meshv:relatedRegistryNumber 52665-69-7 (Calcimycin) relatedRegistryNumber
mesh:M0000002 Temefos M0000002 meshv:casn1_label Phosphorothioic acid, O,O'-(thiodi-4,1-phenylene) O,O,O',O'-tetramethyl ester casn1_label
mesh:M0000002 Temefos M0000002 meshv:registryNumber ONP3ME32DL registryNumber
mesh:M0000002 Temefos M0000002 meshv:relatedRegistryNumber 3383-96-8 (Temefos) relatedRegistryNumber
mesh:M0000011 Abelson murine leukemia virus M0000011 meshv:registryNumber txid11788 registryNumber
mesh:M0000055 Abrin M0000055 meshv:casn1_label Abrins casn1_label
mesh:M0000055 Abrin M0000055 meshv:registryNumber 1393-62-0 registryNumber
mesh:M0000061 Abscisic Acid M0000061 meshv:registryNumber 72S9A8J5GW registryNumber
mesh:M0000061 Abscisic Acid M0000061 meshv:relatedRegistryNumber 113349-29-4 ((Z,E)-isomer) relatedRegistryNumber

One challenge is that registry numbers appear to be local identifiers without any notation of their source.

cthoyt commented 1 year ago

i was told by the mesh people once that you can use a regex to figure out if xrefs are to CAS or UNII

cthoyt commented 1 year ago

@dhimmel pyobo now implements related logic in https://github.com/pyobo/pyobo/blob/8fc402dcfcd089d6e90c2a6c4a4b6a71629d3a33/src/pyobo/sources/mesh.py#L234-L257