qudt / qudt-public-repo

QUDT -Quantities, Units, Dimensions and dataTypes - public repository
Other
108 stars 69 forks source link

Two IRIs are defined in different namespaces in a way that, taken together, violate OWL 2 DL #826

Closed ajnelson-nist closed 7 months ago

ajnelson-nist commented 7 months ago

Hello,

I found that on loading a plurality of QUDT Turtle files from today's main branch (Git state f2711c5) that some concept IRIs are defined in multiple files, at points in manners inconsistent with OWL 2 DL - specifically, with OWL 2 DL's requirements on properties being exactly one of an annotation property, datatype property, or object property (Syntax Section 5.8.1).

This appears to be related to Issue #289 and one or two hops up its chain of related Issues.

The two instances I saw are:

My guess at a resolution for the dcterms:description issue is that it should be only an OWL Annotation Property, because Dublin Core only provides RDFS definitions, not OWL definitions. (I'd be happy to be corrected on that if one has come out since I last looked!)

Discovery mechanism

Disclaimer: Participation by NIST in the creation of the documentation of mentioned software is not intended to imply a recommendation or endorsement by the National Institute of Standards and Technology, nor is it intended to imply that any specific software is necessarily the best available for the purpose.

I found these by running this shapes graph against one graph that had these files loaded:

(That graph also raises some notes about rdf:List usage, but I deactivated that shape for the purposes of isolating these multi-property-type issues.)

steveraysteveray commented 7 months ago

Thanks for your thorough investigation of this! It seems to me that qudt:rdfsDatatype should be declared as an owl:DatatypeProperty.

Regarding dcterms:description, I'm thinking it should be an owl:DatatypeProperty as well, because as I recall, owl:AnnotationProperties cannot be used in OWL restriction classes. Is that your understanding?

ajnelson-nist commented 7 months ago

Thanks for your thorough investigation of this! It seems to me that qudt:rdfsDatatype should be declared as an owl:DatatypeProperty.

I'll defer to you on this, I'm not familiar with the intended usage yet.

Regarding dcterms:description, I'm thinking it should be an owl:DatatypeProperty as well, because as I recall, owl:AnnotationProperties cannot be used in OWL restriction classes. Is that your understanding?

I agree on the latter conclusion. Scanning over the OWL 2 RDF syntax document, which has a strict pattern-matching requirement spelled in the line just before section 4, owl:AnnotationProperty never appears in an owl:Restriction. (See esp. Tables 13 and 1, scanning for "owl:Restriction".)

My current belief is that dcterms:description needs to be an OWL Annotation Property. Neither the RDFS definition for dcterms:description nor its superproperty dc:description include rdfs:range prescription, so you could link an object with dcterms:description and still be conformant with Dublin Core Terms. Some other properties, like dcterms:dateSubmitted, have a prescribed range rdfs:Literal, so that property could be an OWL Datatype Property; but, there would need to be a consensus built on that, and friction could arise if two RDF ecosystems collide that disagree on annotation vs. datatype. With all the above, I think dcterms:description a owl:AnnotationProperty . would be correct. If there's a desire to constrain its range to be literal-valued, a SHACL property shape could be used to constrain the range node kind to sh:Literal within the context of usage within QUDT and QUDT adopters.

steveraysteveray commented 7 months ago

I would agree with you on the dcterms:description, but for QUDT we do impose some cardinality constraints on the relation. That is not an issue for the SHACL schema for QUDT, but it is for the OWL schema (we try to maintain both schemas). Specifically an owl:maxCardinality of 1 for our root class, qudt:Concept.

I'm tempted to try it out with dcterms:description as an owl:DatatypeProperty and see if it breaks the loading into Protege. Please let me know if you think this is a bad idea.

Also, I'm impressed with the shapes graph you pointed to, and plan to look into it more deeply. Thanks for the pointer.

ajnelson-nist commented 7 months ago

I'm tempted to try it out with dcterms:description as an owl:DatatypeProperty and see if it breaks the loading into Protege. Please let me know if you think this is a bad idea.

I think, within the context of only QUDT within this repository, you will not encounter any problems picking one or the other, aside from wanting the cardinality restriction specified in OWL.

I also think, in a broader context looking to a plurality of QUDT adopters that I will just assume for the purposes of this comment, that you will open yourself up to trouble from an OWL transitive import closure stumbling on the same property-class conflict that I did, if you tried to assert owl:DatatypeProperty. Let's say some QUDT adopter went with the "maximally-allowed, least-restricted" mindset I hinted at previously. ("Nothing says I can't link an object...welp, I guess it's gotta be anno.") That adopter would always hit the DP-AP conflict firing up a tool that needs OWL 2 DL conformance.

I'm not asserting these conflicts exist today, but I do believe they're possible the more "inheriting" a downstream data model becomes.

steveraysteveray commented 7 months ago

I'm going to have to think more deeply about this. (I'm also hoping most new adopters will be using SHACL rather than OWL, where this problem goes away).

Meanwhile, I noticed another thing in your original post:

schema/SCHEMA-FACADE_QUDT-v2.1.ttl schema/SCHEMA_QUDT-v2.1.ttl schema/SCHEMA_QUDT-DATATYPE-v2.1.ttl schema/SHACL-SCHEMA-SUPPLEMENT_QUDT-v2.1.ttl

This combination of files should not happen. Either a user would load the web-resident version of SCHEMA-FACADE_QUDT-v2.1.ttl (i.e. this), that imports the other files but not the SHACL supplement file, or

...a user loads the repo version of SCHEMA-FACADE_QUDT-v2.1.ttl that imports schema/shacl/SCHEMA_QUDT_NoOWL-v2.1.ttl rather than schema/SCHEMA_QUDT-v2.1.ttl.

This arrangement is our attempt to keep the LinkedData/OWL people happy with web-loading, and the SHACL people happy with repo-loading, described here.

How did you come to group the files the way you did?

ajnelson-nist commented 7 months ago

How did you come to group the files the way you did?

Naïvely. I took everything that looked like the current files and piled them into one. Apologies that I missed documentation prescribing which files to import.

I remember intending to note this was a naïve file grouping. But as I was writing things up, the fundamental issue turned out to be independent of whether I grouped files incorrectly, so I forgot that note.

Thank you for the prescriptions, though!

steveraysteveray commented 7 months ago

I agree - the problem exists regardless. I thought I'd try to learn from your experience to better guide our users.

Meanwhile, I tried running your owl.ttl file on the repository and I see the List-shape violation. I'll have to go back and read up on why OWL insists the subject of an rdf:first needs to be a blank node. Unexpected.

steveraysteveray commented 7 months ago

Interesting. So having read the spec, I see it is for backwards compatibility with OWL 1 DL, seen as a redundant declaration. Both definitions are under our control, but they are in the schema/shacl/SHACL-SCHEMA-SUPPLEMENT_QUDT-v2.1.ttl file which should never be seen by OWL users anyway, so maybe my question below is academic:

Here's one of the two offending definitions.

qudt:NumericUnionList
  rdf:type rdf:List ;
  rdf:first [
      sh:datatype xsd:string ;
    ] ;
  rdf:rest (
      [
        sh:datatype xsd:nonNegativeInteger ;
      ]
      [
        sh:datatype xsd:positiveInteger ;
      ]
      [
        sh:datatype xsd:integer ;
      ]
      [
        sh:datatype xsd:int ;
      ]
      [
        sh:datatype xsd:float ;
      ]
      [
        sh:datatype xsd:double ;
      ]
      [
        sh:datatype xsd:decimal ;
      ]
    ) ;
  rdfs:comment "An rdf:List that can be used in property constraints as value for sh:or to indicate that all values of a property must be either xsd:integer, xsd:float, xsd:double or xsd:decimal." ;
  rdfs:isDefinedBy <http://qudt.org/2.1/schema/shacl/overlay/qudt> ;
  rdfs:label "Numeric Union List" ;
.

If we don't declare it of type rdf:List, I wonder what the OWL gods would have us declare it as?

ajnelson-nist commented 7 months ago

As an aside: I suggest the List shape discussion be left as out of scope of resolving this Issue.

I noted in my first post that I deactivated that shape as part of the review. It was a shape that, in discussions with the committee that maintains that file, I'd expressed I was unhappy to have found necessary to write, but my review of the OWL spec was telling me it seemed necessary.

I'm close to proposing revising that shape. I've realized, in the while that shape has been operational, that there are differing concerns in OWL review of rdf:List. The OWL-RDF syntax document makes specific requirements on T(SEQ y1 ... yn) inducing blank nodes (Table 1). What I missed was a scope: T(SEQ y1 ... yn) seems to only apply as a first step from certain OWL predicates. I missed that this would not apply for other constructs, e.g. SKOS's skos:memberList (and I recall SKOS reports OWL 2 DL conformance), or DASH's dash:dateOrDateTime.

I need to find a few other citations around rdf:List handling, specifically a line I recall reading where somewhere between RDF, RDF Schema, and OWL, there is a requirement that rdf:Lists be strictly linear. But by the end, that specific list shape will raise fewer reported issues. The SHACL usage in QUDT of rdf:List is, I suspect, being flagged as a false-positive issue by that UCO OWL shape. It's down to a matter of hopping between the four specifications to confirm everything.

Recapping, specifically to QUDT - I have no remarks on QUDT's current usage of rdf:List. I'll file notes if I think there are any. But needing to repeat lists because of a blank node restriction is a known maintenance concern.