rat-genome-database / CMO-Clinical-Measurement-Ontology

Clinical Measurement Ontology (CMO) maintained by RGD
https://github.com/rat-genome-database/CMO-Clinical-Measurement-Ontology
3 stars 4 forks source link

CMO has multiple syntactic issues #11

Open cmungall opened 1 year ago

cmungall commented 1 year ago

There are a number of major syntactic issues with the CMO obo and owl releases. This causes many parsers to break, as was reported in #5.

Even in cases where parsers don't break, the results of the parse give unintended results

For example, virtually all ontology browsers and tools do not correctly interpret the definitions, because they are encoded in OBO as a property_value:

id: CMO:0003022
name: hemoglobin distribution width
is_a: CMO:0000508 ! hemoglobin measurement
property_value: created:by "sjwang" xsd:string
property_value: creation:date 2019-01-29T15:15:44Z xsd:string
property_value: hasExactSynonym "HDW" xsd:string
property_value: http://purl.obolibrary.org/obo/def "The distribution width of erythrocytes by their cellular (individual) hemoglobin concentrations. It is a measurement of the heterogeneity of the red cell hemoglobin concentration." xsd:string {http://www.geneontology.org/formats/oboInOWL#xref="PMID:PMID\\:3411196"}

This should be:

id: CMO:0003022
name: hemoglobin distribution width
is_a: CMO:0000508 ! hemoglobin measurement
created_by: "sjwang"
creation_date: 2019-01-29T15:15:44Z
synonym: "HDW" EXACT []
def: "The distribution width of erythrocytes by their cellular (individual) hemoglobin concentrations. It is a measurement of the heterogeneity of the red cell hemoglobin concentration." [PMID:3411196]

The OWL that gets generated from the current obo is both syntactically incorrect, and even when parsed, information gets missed. For example on OLS:

https://www.ebi.ac.uk/ols/ontologies/cmo/terms?iri=http%3A%2F%2Fpurl.obolibrary.org%2Fobo%2FCMO_0003022

The definition doesn't show up where it should:

image

It looks like at some point in the past you passed the obo through a troundtrip with a very old obo to owl converter.

If you like I can provide a PR that fixes CMO. I assume this is the source file: https://github.com/rat-genome-database/CMO-Clinical-Measurement-Ontology/blob/master/clinical_measurement.obo

or I can provide support to a developer to fix this