rat-genome-database / CMO-Clinical-Measurement-Ontology

Clinical Measurement Ontology (CMO) maintained by RGD
https://github.com/rat-genome-database/CMO-Clinical-Measurement-Ontology
3 stars 4 forks source link

Wikipedia URIs are not valid. #1

Open jpmccu opened 4 years ago

jpmccu commented 4 years ago

When I attempt to load CMO into a RDFlib graph:

import rdflib
g = rdflib.Graph()
g.load('http://purl.obolibrary.org/obo/cmo.owl',format="xml")

I get the following error:

http://purl.obolibrary.org/obo/Wikipedia#_http\://en.wikipedia.org/wiki/Waist-hip_ratio does not look like a valid URI, trying to serialize this will break.
http://purl.obolibrary.org/obo/MedicineNet#_http\://www.medicinenet.com does not look like a valid URI, trying to serialize this will break.

I recommend using the direct URL in question, or rewrite the URL to not include the escape character. Ideally, you should refer to the actual web page directly without escaping it.

turbomam commented 4 years ago

Hi @jimmccusker

I have seen pervasive excessive backslashing in CMO. Let me know if you come up with a good solution.

I think CMO is going to be important for something I'm working on now, so I'm just going to do find and replace in a text editor. Unfortunately, s/\\//g won't work, because some backslashes have to be retained for escaping quotation marks within quotation marks.

This hasn't worked for me either yet: s/\\[^'|"]//g

turbomam commented 4 years ago

Sorry, I had converted cmo.owl to Turtle format before writing my previous comment. So my obsession with preserving backslashes that escape quotes isn't really relevant.

jpmccu commented 4 years ago

The best bet would be for the CMO creators to just use the wikipedia URL directly like this:

http://en.wikipedia.org/wiki/Waist-hip_ratio
jpmccu commented 4 years ago

It's not just wikipedia, it was all over the place. I tested this by loading the ontology into protege, exporting it as Turtle, and then loading the resulting file into RDFlib, which did not complain about invalid URIs anymore.

jrsjrs commented 4 years ago

I apologize for coming into this conversation late. It sounds like you might have found a work-around for the problem, but we could also change terms in the ontology to replace the slash with "or". This would not completely remove slashes but it would reduce the number. Please let us know if you need us to make that change.

jpmccu commented 4 years ago

Slashes aren't the problem, it's the use of backslashes to escape things like colons in URIs. Please see the differences in my PR for this. Maintainers shouldn't be using this:

Wikipedia:https\://en.wikipedia.org/wiki/Polyunsaturated_fatty_acid

to refer to a wikipedia page, they should just use the URL directly, like this:

https://en.wikipedia.org/wiki/Polyunsaturated_fatty_acid

When converting to RDF the right thing happens, and that URL is used to identify the wikipedia page directly. There's no need for an ad-hoc Wikipedia: namespace to be used here.