qudt / qudt-public-repo

QUDT -Quantities, Units, Dimensions and dataTypes - public repository
Other
116 stars 73 forks source link

base ontology doesn't load properly #341

Open DougalW opened 3 years ago

DougalW commented 3 years ago

Hi

When I load the ontology into Protege from http://qudt.org/2.1/schema/qudt, it generates the following error:

Ontology already exists. OntologyID(OntologyIRI(<http://qudt.org/2.1/schema/extensions/imports>) VersionIRI(<http://qudt.org/2.1/schema/qudt>))

Perhaps there's a circular reference in the TTL file forcing it to reload itself?

jhodgesatmb commented 3 years ago

We would like to fix the protege load errors but do not (me, that is) understand how to read/interpret the Protege Error Classes (and the documentation is 2 paragraphs on their wiki).

We understand that Protege doesn’t handle circular imports gracefully and that may be the problem referenced here, but there are other problems that need to be addressed.

Jack Hodges, Ph.D. Arbor Studios

On Jan 20, 2021, at 4:17 PM, Dougal Watt notifications@github.com wrote:

 Hi

When I load the ontology into Protege from http://qudt.org/2.1/schema/qudt, it generates the following error:

Ontology already exists. OntologyID(OntologyIRI(http://qudt.org/2.1/schema/extensions/imports) VersionIRI(http://qudt.org/2.1/schema/qudt))

Perhaps there's a circular reference in the TTL file forcing it to reload itself?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe.

DougalW commented 3 years ago

I'll take a look at why it generates the error and let you know. May take me a few days though.

steveraysteveray commented 3 years ago

@jhodgesatmb, for what it's worth, here's the current import visualization for the OWL version of QUDT. It does indeed show the circular import reference from "functions" back to the schema. If that is indeed causing the problem, I recommend we remove the import statement inside the "functions" graph. The reason it is there is because it is a pain to develop new functions without visibility of the schema, but once the development is completed, we can use the functions without the import statement. image

jhodgesatmb commented 3 years ago

That is a helpful diagram. I do not see any reason why the functions in the SPIN hierarchy would have to import the QUDT schema. I can definitely see the other way around. I wonder how many of these there are...

Jack

On Thu, Jan 21, 2021 at 8:57 AM steveraysteveray notifications@github.com wrote:

@jhodgesatmb https://github.com/jhodgesatmb, for what it's worth, here's the current import visualization for the OWL version of QUDT. It does indeed show the circular import reference from "functions" back to the schema. If that is indeed causing the problem, I recommend we remove the import statement inside the "functions" graph. The reason it is there is because it is a pain to develop new functions without visibility of the schema, but once the development is completed, we can use the functions without the import statement. [image: image] https://user-images.githubusercontent.com/1130189/105383933-82bcde00-5bc6-11eb-8a29-5653f7982e5a.png

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/qudt/qudt-public-repo/issues/341#issuecomment-764790758, or unsubscribe https://github.com/notifications/unsubscribe-auth/AATQRWPASJKYFW45CHWTEOTS3BMFZANCNFSM4WLW6GVQ .

-- Jack

steveraysteveray commented 3 years ago

That is the only one.

DougalW commented 3 years ago

just loaded the FUNCTIONS file from

https://raw.githubusercontent.com/qudt/qudt-public-repo/master/schema/extensions/FUNCTIONS_QUDT-v2.1.ttl

and this also gives an error in Protege:

Ontology already exists. OntologyID(OntologyIRI(<http://qudt.org/2.1/schema/extensions/functions>) VersionIRI(<null>))

which looks like the circular import problem - Protege has loaded Functions but then imports the full QUDT which in turn imports Functions.

I suggest removing the import declaration from the uploaded Functions file and test that it resolves the issue.

jhodgesatmb commented 3 years ago

The functions file may be moving to the development branch so this one problem may go away.

Jack Hodges, Ph.D. Arbor Studios

On Jan 28, 2021, at 10:09 PM, Dougal Watt notifications@github.com wrote:

 just loaded the FUNCTIONS file from

https://raw.githubusercontent.com/qudt/qudt-public-repo/master/schema/extensions/FUNCTIONS_QUDT-v2.1.ttl

and this also gives an error in Protege:

Ontology already exists. OntologyID(OntologyIRI(http://qudt.org/2.1/schema/extensions/functions) VersionIRI())

which looks like the circular import problem - Protege has loaded Functions but then imports the full QUDT which in turn imports Functions.

I suggest removing the import declaration from the uploaded Functions file and test that it resolves the issue.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

steveraysteveray commented 3 years ago

PR #348 fixes the circular import. Not to say there aren't other Protege issues...

DougalW commented 3 years ago

Hi, I just loaded http://qudt.org/2.1/schema/qudt again and the circular import error is fixed and all import statements are processed correctly. However Protege reports several errors, mostly due to the classes "Enumerated Value", "Enumeration", "Unit", "StructuredDatatype", "PhysicalConstant", "Citation", "QuantityKindDimensionVector", "Verifiable" and a few others:

Screen Shot 2021-02-16 at 5 49 40 PM

Also, if I load from https://raw.githubusercontent.com/qudt/qudt-public-repo/master/schema/SCHEMA_QUDT-v2.1.ttl Protege also generates several errors but slightly different ones. Again, all import statements are processed correctly. Errors are caused by the "Verifiable" class:

Screen Shot 2021-02-16 at 5 47 36 PM
jhodgesatmb commented 3 years ago

If you could translate those errors out of Protege Error classes that would be very helpful as I find the Protege Error classes to be unhelpful in identifying an error; it only tells me what the offending item is. So, from your itemization, what is the problem with each of them? As soon as I have this information I’ll get to work on resolving them. Thank you.

Jack

Sent from my iPad

On Feb 15, 2021, at 8:52 PM, Dougal Watt notifications@github.com wrote:

 Hi, I just loaded http://qudt.org/2.1/schema/qudt again and the circular import error is fixed and all import statements are processed correctly. However Protege reports several errors, mostly due to the classes "Enumerated Value", "Enumeration", "Unit", "StructuredDatatype", "PhysicalConstant", "Citation", "QuantityKindDimensionVector", "Verifiable" and a few others:

Also, if I load from https://raw.githubusercontent.com/qudt/qudt-public-repo/master/schema/SCHEMA_QUDT-v2.1.ttl Protege also generates several errors but slightly different ones. Again, all import statements are processed correctly. Errors are caused by the "Verifiable" class:

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

DougalW commented 3 years ago

For some errors it looks like you have some classes, annotation and datatypes that have the same URI...

Since these are not declared sameAs, this will cause Protege to generate errors.

e.g.

  <!-- http://qudt.org/schema/qudt/LatexString -->

    <owl:Class rdf:about="http://qudt.org/schema/qudt/LatexString">
        <rdfs:subClassOf rdf:resource="http://www.w3.org/2001/XMLSchema#string"/>
    </owl:Class>

but there is also an Annotation called Latex String

    <rdf:Description rdf:about="http://qudt.org/schema/qudt/LatexString">
        <rdfs:comment>A type of string in which some characters may be wrapped with &apos;\(&apos; and &apos;\) characters for LaTeX rendering.</rdfs:comment>
        <rdfs:isDefinedBy rdf:resource="http://qudt.org/2.1/schema/qudt"/>
        <rdfs:label>Latex String</rdfs:label>
    </rdf:Description>

also has a Datatype with the same URI:

  <!-- http://qudt.org/schema/qudt/LatexString -->

    <rdfs:Datatype rdf:about="http://qudt.org/schema/qudt/LatexString"/>

For the class: integerPercentage (Error 13 is equivalent to this class so will cause an error):

<!-- http://qudt.org/schema/qudt/integerPercentage -->

<owl:Class rdf:about="http://qudt.org/schema/qudt/integerPercentage">
    <rdfs:subClassOf rdf:resource="http://www.w3.org/2001/XMLSchema#integer"/>
</owl:Class>

there is also an Annotation with the same URI:

integer percentage and a Datatype with the same URI: For the class: `floatPercentage` (Error 10 is equivalent to this class) ``` ``` there are similarly Datatypes and Annotations with the same URI. The other errors look much harder to debug but might be caused by the ones I debugged above. If you fix these I can re-test.
jhodgesatmb commented 3 years ago

I was puzzled and wondered if two of the problems you mention above, the two percentages, might be related to the subclassing of xsd datatypes. I thought these are supposed to be terminals, but I ran into cases which were subclassing xsd datatypes that didn’t throw an error.

Jack

Sent from my iPad

On Mar 21, 2021, at 7:51 PM, Dougal Watt @.***> wrote:

 For some errors it looks like you have some classes, annotation and datatypes that have the same URI...

Since these are not declared sameAs, this will cause Protege to generate errors.

e.g.

class: Latex String

<owl:Class rdf:about="http://qudt.org/schema/qudt/LatexString">
    <rdfs:subClassOf rdf:resource="http://www.w3.org/2001/XMLSchema#string"/>
</owl:Class>

but there is also an Annotation called Latex String

<rdf:Description rdf:about="http://qudt.org/schema/qudt/LatexString">
    <rdfs:comment>A type of string in which some characters may be wrapped with &apos;\(&apos; and &apos;\) characters for LaTeX rendering.</rdfs:comment>
    <rdfs:isDefinedBy rdf:resource="http://qudt.org/2.1/schema/qudt"/>
    <rdfs:label>Latex String</rdfs:label>
</rdf:Description>

also has a Datatype with the same URI:

<rdfs:Datatype rdf:about="http://qudt.org/schema/qudt/LatexString"/>

For the class: integerPercentage (Error 13 is equivalent to this class so will cause an error):

there is also an Annotation with the same URI: rdfs:labelinteger percentage and a Datatype with the same URI: For the class: floatPercentage (Error 10 is equivalent to this class) there are similarly Datatypes and Annotations with the same URI. The other errors look much harder to debug but might be caused by the ones I debugged above. If you fix these I can re-test. — You are receiving this because you were assigned. Reply to this email directly, view it on GitHub, or unsubscribe.
steveraysteveray commented 3 years ago

I'm having trouble finding any of the code for LatexString that you are quoting, anywhere in the qudt-public-repo. All I can find is the following:

`

qudt:LatexString a rdfs:Datatype ; rdfs:comment "A type of string in which some characters may be wrapped with '\(' and '\) characters for LaTeX rendering." ; rdfs:isDefinedBy http://qudt.org/2.1/schema/qudt ; rdfs:label "Latex String" ; rdfs:subClassOf xsd:string ; .

` ...which is in file schema/SCHEMA_QUDT-v2.1.ttl

Can you tell me where you are finding the code?

DougalW commented 3 years ago

load this url into Protege:

https://raw.githubusercontent.com/qudt/qudt-public-repo/master/schema/SCHEMA_QUDT-v2.1.ttl

then scroll down the Entities window in Protege and you will see xsd:string (xsd:string) - expand this and you can see Latex String

DougalW commented 3 years ago

I was puzzled and wondered if two of the problems you mention above, the two percentages, might be related to the subclassing of xsd datatypes. I thought these are supposed to be terminals, but I ran into cases which were subclassing xsd datatypes that didn’t throw an error. Jack

The subclassing of the datatype seems to be part of the problem. Doing this causes Protege to turn xsd:string into a Class, which shows up in the Entities tab but it's not actually a class in the first place.

A better approach would be to create a Data Property called Latex String with Range xsd:string because fundamentally it's still just a string and there's no need to subclass Data Property.

jhodgesatmb commented 3 years ago

Agreed. But it begs the question even having a LatexString if it is just a regular string with no other restriction classes. If the point is to have a property that includes latex characters, then your suggestion makes the most sense.

On Mon, Mar 22, 2021 at 3:01 PM Dougal Watt @.***> wrote:

I was puzzled and wondered if two of the problems you mention above, the two percentages, might be related to the subclassing of xsd datatypes. I thought these are supposed to be terminals, but I ran into cases which were subclassing xsd datatypes that didn’t throw an error. Jack … <#m-1809538643337714655>

The subclassing of the datatype seems to be part of the problem. Doing this causes Protege to turn xsd:string into a Class, which shows up in the Entities tab but it's not actually a class in the first place.

A better approach would be to create a Data Property called Latex String with Range xsd:string because fundamentally it's still just a string and there's no need to subclass Data Property.

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/qudt/qudt-public-repo/issues/341#issuecomment-804424513, or unsubscribe https://github.com/notifications/unsubscribe-auth/AATQRWIZQX7N2ZCVGZGVB33TE6435ANCNFSM4WLW6GVQ .

-- Jack

DougalW commented 3 years ago

I'd go back to the original design decisions and ask:

I don't know about the former, but looking at the later it seems there is no extra info beyond xsd:string , so it serves no purpose and is redundant.

Personally, I'd get rid of it.

jhodgesatmb commented 3 years ago

I know the original intent of the property but not of the implementation as such. I will discuss with Steve and Ralph.

On Mon, Mar 22, 2021 at 3:15 PM Dougal Watt @.***> wrote:

I'd go back to the original design decisions and ask:

  • what competency questions gave rise to it's inclusion?
  • is there any actual additional information carried with it?

I don't know about the former, but looking at the later it seems there is no extra info beyond xsd:string , so it serves no purpose and is redundant.

Personally, I'd get rid of it.

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/qudt/qudt-public-repo/issues/341#issuecomment-804431399, or unsubscribe https://github.com/notifications/unsubscribe-auth/AATQRWMCIJBZRDMZMLZCW7DTE66P5ANCNFSM4WLW6GVQ .

-- Jack

dr-shorthair commented 3 years ago

Annotating literals with a datatype is standard RDF. But custom datatypes is a tricky area of OWL2. It is part of the OWL2 specs - https://www.w3.org/TR/owl2-primer/#Advanced_Use_of_Datatypes - and in principle strings could be restricted using patterns. But withRestrictions seems to confuse Protege, and make it think there are classes involved.

DougalW commented 3 years ago

Sure it’s standard but the problem here seems more to be the same uri is referring to different things in the one file: class annotation and custom datatype. I’m not sure if the original file was authored outside of protege or has already been processed by Protege but the actual file itself shows these three separate entities with the same uri. That’s bound to confuse Protege

dr-shorthair commented 3 years ago

QUDT is published through the TopQuadrant stack, and usually maintained using TopBraid. This does not use the OWL-API serializer. TopBraid is less fierce than Protege on the OWLy parts, as it is more oriented towards RDF and SHACL.

I am also a TopBraid user, though I usually run my productions through Protege as a 'final check', which does often uncover issues, particularly around owl:Restriction. I also sometimes find it useful to re-serialize by saving from Protege, in order to generate the OWL-API style where you know that the main users are OWLy people, rather than Linked Data people.

DougalW commented 3 years ago

As someone who builds semantic knowledge graphs, I'd recommend using Protege for validation of qudt because it's prevalent in our field.

Do you also use reasoners in TopBraid? Protege has good support for many reasoners and when I run a reasoner over qudt it immediately barfs up errors due to the qudt namespace having a range xsd:anyURI (not an error I've seen before!)

jhodgesatmb commented 3 years ago

Topbraid has reasoners, of course.

Sent from my iPad

On Mar 22, 2021, at 9:48 PM, Dougal Watt @.***> wrote:

 As someone who builds semantic knowledge graphs, I'd recommend using Protege for validation of qudt because it's prevalent in our field.

Do you also use reasoners in TopBraid? Protege has good support for many reasoners and when I run a reasoner over qudt it immediately barfs up errors due to the qudt namespace having a range xsd:anyURI (not an error I've seen before!)

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub, or unsubscribe.

dr-shorthair commented 3 years ago

@DougalW perhaps you could make a fork, correct the obvious errors with easy fixes, and then issue PRs?

On the anyURI case that you have raised, I ran the following SPARQL on the QUDT schema

SELECT * 
WHERE { ?p rdfs:range xsd:anyURI . }

and uncovered the following:

[p]
dcterms:source
qudt:dbpediaMatch
qudt:image
qudt:imageLocation
qudt:informativeReference
qudt:isoNormativeReference
qudt:normativeReference
qudt:onlineReference
qudt:url
vaem:isElaboratedIn
vaem:latestPublishedVersion
vaem:logo
vaem:namespace
vaem:previousPublishedVersion
vaem:rdfxmlFileURL
vaem:turtleFileURL
vaem:urlForHTML
vaem:website

Most of those are probably intended to be links, in which case they should be owl:ObjectProperty, with no rdfs:range specified. It is likely that the following small subset really should be anyURI, since they are not intended to be links:

qudt:url
vaem:namespace
vaem:rdfxmlFileURL
vaem:turtleFileURL
vaem:urlForHTML

QUDT has been in development for about 10 years now, and design patterns and coding standards (and the main contributors) have evolved during that time, so there is a bit of cruft. This is being gradually cleaned up, often in response to comments from external users like you.

Note that the team you are engaging with here has decades of experience building semantic knowledge graphs (check the GitHub profiles of the main committers). While Protege is a popular tool, particularly in academia, there are others in use, though maybe not 'in [y]our field'. The whole point of standards is so that we are not confined to one implementation. But I agree that it is helpful to run products through more than one to find bugs (as I already mentioned above).

DougalW commented 3 years ago

@dr-shorthair thanks for the comment and good to get some background. I'll clone the repo and have a look but it might take a while to go through.

jhodgesatmb commented 3 years ago

What Simon says is absolutely true. With the board being the only participants in the management of QUDT for all years up to about 2020, and all of us being volunteers and gainfully employed during that time, it has been our goal to first make sure that the quantity-triad pattern and associated vocabularies are viable and accurate. This past year we expanded our focus to include/crossreference other unit vocabularies.

It is absolutely true that QUDT could use a top-to-bottom overhaul of things like what were mentioned by the OP, and this kind of comment is genuinely appreciated. We just need to prioritize the work we can do. It recently (in the past 8 months or so) became important to us to get QUDT to load without errors in Protege, but reading and understanding Protege errors got in in the way of making the changes to QUDT. No attempt to find out how to understand Protege errors has resulted in more than a few changes to the ontology. This conversation will hopefully move the bar in that regard.

Jack

On Mon, Mar 22, 2021 at 11:14 PM Simon Cox @.***> wrote:

@DougalW https://github.com/DougalW perhaps you could make a fork, correct the obvious errors with easy fixes, and then issue PRs?

  • I agree that having xsd:anyURI as the rdfs:range of a whole host of QUDT properties is probably misguided
  • some other 'errors' may turn out not to be, so would merit some discussion. It may be that some minor refactoring would stop Protege throwing errors

On the anyURI case that you have raised, I ran the following SPARQL on the QUDT schema

SELECT * WHERE { ?p rdfs:range xsd:anyURI . }

and uncovered the following:

[p] dcterms:source qudt:dbpediaMatch qudt:image qudt:imageLocation qudt:informativeReference qudt:isoNormativeReference qudt:normativeReference qudt:onlineReference qudt:url vaem:isElaboratedIn vaem:latestPublishedVersion vaem:logo vaem:namespace vaem:previousPublishedVersion vaem:rdfxmlFileURL vaem:turtleFileURL vaem:urlForHTML vaem:website

Most of those are probably intended to be links, in which case they should be owl:ObjectProperty probably with no rdfs:range specified. It is likely that this small subset really should be anyURI, since they are not intended to be used as links:

qudt:url vaem:namespace vaem:rdfxmlFileURL vaem:turtleFileURL vaem:urlForHTML

QUDT has been in development for about 10 years now, and design patterns and coding standards (and the main contributors) have evolved during that time, so there is a bit of cruft. This is being gradually cleaned up, often in response to comments by external users like you.

And the team you are engaging with here has decades of experience building semantic knowledge graphs (check the GitHub profiles of the main committers). While Protege is a popular tool, particularly in academia, there are others in use, though maybe not in 'your field'. The whole point of standards is so that we are not confined to one implementation stack. But I agree that it is helpful to run products through more than one to find bugs (as I already mentioned above).

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/qudt/qudt-public-repo/issues/341#issuecomment-804647437, or unsubscribe https://github.com/notifications/unsubscribe-auth/AATQRWLRNVHBUJ6V6K5QGQTTFAWSTANCNFSM4WLW6GVQ .

-- Jack

steveraysteveray commented 3 years ago

@DougalW, I'm concluding that Protege is creating these additional assertions. When you say "the actual file itself shows these three separate entities with the same uri", that's not the case for the file in the repo. You can just open the schema file with a plain old editor and search, which is what I did.

oldskeptic commented 3 years ago

Protege occasionally does things "it's way" while being cryptic. Repeated use of <rdf:Description ... is a valid xml/rdf serialization and isn't an error. It happens when parsing / converting unsorted triples; you will often see other toolchains like rdflib/rapper produce the same output.

dr-shorthair commented 3 years ago

@DougalW, I'm concluding that Protege is creating these additional assertions.

Yes - I think the fact that @DougalW shows an RDF/XML serialization of the errors gives the game away. The QUDT schema in the repository is stored as TTL, so he must be loading and then exporting it as RDF/XML to see those effects.

I suspect the underlying issue is that Protege uses OWL-API for i/o, and this adds some 'triples' which follow from the OWL interpretation of the RDF serialization. Those renegade triples will follow from that somehow. I've seen this many times with Datatypes. When you add axioms to a datatype, OWL often promotes it to a Class ... and then the trouble starts.

@ralphtq I suspect Holger could run this down in a minute.

DougalW commented 3 years ago

@dr-shorthair yes, it looks like Protege is adding the assertions from the TTL original. On the plus side, it has processed the imports correctly :)

steveraysteveray commented 3 years ago

Picking up from the comment made here, I'm thinking that the problem with the xsd:anyURI range is not so much the stated rdfs:range as it is with the values not being explicitly declared to be of type xsd:anyURI. I have pushed a branch (srr-anyURI) that fixes all occurrences of properties with that range so that the values are now declared as such. I'm curious to see if this fixes things. Could @DougalW or @dr-shorthair try it out?