muninn / ontology-tools

A vocabulary / ontology documentation generator.
1 stars 0 forks source link

Docgen breaking after parsing OWL Classes #18

Closed alliyya closed 6 years ago

alliyya commented 6 years ago

Occurs when owl classes and owl object properties have incorrect attributes. owl:Class rdf:about rather than owl:Class rdf:ID, adding exception handling to properly reflect this an issue with the ontology as opposed to the document generator itself.

class--> rdf:ID object properties --> rdf:ID instances --> rdf:about

joelacummings commented 6 years ago

I noticed someone was changing my rdf:about. I see why now but I'd like to say that rdf:about is completely valid as far as the spec goes (https://www.w3.org/TR/owl-ref/#subClassOf-def) and the bigger problem is a lot of tools will exclusively use rdf:about when defining elements (i.e. Protege and rdflib). The reason for this is rdf:about allows you to specify full URIs that maintain definitions outside of a document so they are not just within the document. It is explained very well here: https://stackoverflow.com/questions/7118326/differences-between-rdfresource-rdfabout-and-rdfid#7119042 -- Some developers exclusively use rdf:about for this reason.

In order to make the tool as supportive as possible it would be ideal to support rdf:about, if you'd like help I'm willing to assist where necessary.

Instances will always specify their parent as the differentiating factor i.e. <cwrc:ClassImAnInstanceOf rdf:about="myID">

rwarren2 commented 6 years ago

As is usual, the semantic web standards are moving targets and the RDF/XML syntax carries a lot of legacy decisions and concepts from the xml days.

specgen is based on the python bindings for rdflib and uses that parser to load the owl file. Interestingly, whatever bug we were hitting was triggered by rdf:about.

The major difference between rdf:about and rdf:ID is described in 2.14 Abbreviating URIs: rdf:ID and xml:base of the RDF syntax. Short version, the value of rdf:ID can only occur once while rdf:about will allow the reuse of the URI.

The use of rdf:ID for ontologies dates back to the early days of OWL in XML where the single node definition was used for data integrity by libraries like OWLAPI (also used by protege) and the local URI restriction allowed the ontological terms to be grounded to the container.

As the cwrc ontology increases in size, I'd rather keep rdf:ID as an additional check for ontological terms. Whatever future changes, this issue is strictly in the owl file and does not affect the rest of the file formats or what occurs in the SPARQL server.

alliyya commented 6 years ago

Resolved in d26e508b822f70a860f7e5c344aad0dceea29b37