ucoProject / UCO

This repository is for development of the Unified Cyber Ontology.
Apache License 2.0
76 stars 34 forks source link

Correct OWL 2 DL syntax of enumerations of literals #435

Closed ajnelson-nist closed 1 year ago

ajnelson-nist commented 2 years ago

Background

This proposal is an upgrade to UCO CP-90, and is in part a transcription to re-capture original motivations. The CP-90 Confluence page should now be considered superseded by this Github Issue.

The construction of UCO vocabularies does not conform to the OWL 2 mechanism for custom datatype definitions. The non-conformance issue harkens back to the definition of datatypes in RDF Schema. This proposal adds and encodes two requirements for the UCO vocabulary namespace, which apply equally to CASE’s vocabulary namespace.

For illustration, we will upgrade UCO's vocabulary:BitnessVocab to become conformant. (This was selected by merit of having just two members.)

Here is the current definition of BitnessVocab, omitting the rdfs:label and rdfs:comment so we may focus on the OWL syntax:

vocabulary:BitnessVocab
    a rdfs:Datatype ;
    rdfs:subClassOf rdfs:Resource ;
    owl:oneOf (
        "32"^^vocabulary:BitnessVocab
        "64"^^vocabulary:BitnessVocab
    ) ;
    .

Requirements

Requirement 1

UCO must satisfy with OWL 2 syntactic requirements. This entails requiring conformance with RDF syntactic requirements. (These requirements are not distinct to this proposal.) (These requirements are agnostic to serialization language, e.g. application/rdf+xml, text/turtle, et al.)

Requirement 2

Remove all statements of the form vocabulary:SomeVocab rdfs:subClassOf rdfs:Resource . from the vocabulary namespace.

Requirement 2 applicability

This syntax is present in the vocabulary definitions that has confused some tools used in earlier attempts to review the Datatype issues. This statement:

vocabulary:BitnessVocab rdfs:subClassOf rdfs:Resource .

is entailed by RDF Schema, Sections 2.3 and 2.4, which respectively state:

rdfs:Literal is a subclass of rdfs:Resource.

Each instance of rdfs:Datatype is a subclass of rdfs:Literal.

At least one OWL Profile validation tool is confused by this redundant statement. An error declaration is made by the tool ROBOT and its verb validate-profile, that the custom vocabulary declares itself an individual and a class. This violates OWL 1 DL, and is not excepted by OWL 2 DL's Punning.

Hence, with no loss of semantics, we propose this requirement to prevent tool confusion.

Requirement 3

In conformance with OWL 2 DL and RDF datatype definitions, UCO's datatypes that are enumerations of string-literals must conform with this definition from RDF 1.1 Concepts Section 5

A datatype consists of a lexical space, a value space and a lexical-to-value mapping, and is denoted by one or more IRIs.

Requirement 3 applicability

For some of the original XML Schema datatypes, the value space draws from abstract and/or platonic concepts, such as xsd:boolean containing the values true and false, distinct from the lexical values "true" and "false".

As spelled in UCO 0.9.0 and earlier, BitnessVocab confuses the lexical space and value space, and provides no mapping. The syntax for defining a datatype is given in functional syntax in OWL 2 Syntax, Section 9.4; see especially the example a:SSN. Also note that RDF syntax can be toggled "On" with that document's "Show RDF in Examples" button, which shows this demonstration:

a:SSN rdf:type rdfs:Datatype .

a:SSN owl:equivalentClass _:x .
_:x rdf:type rdfs:Datatype .
_:x owl:onDatatype xsd:string .
_:x owl:withRestrictions ( _:y ) .
_:y xsd:pattern "[0-9]{3}-[0-9]{2}-[0-9]{4}" .

a:hasSSN rdfs:range a:SSN .

(OWL 1 made discussion of lexical vs. value Datatypes somewhat less confusing to discuss with a concept owl:DataRange. However, owl:DataRange was dropped in the transition to OWL 2. So, we must make do with discussing "Value space" Datatypes and "Lexical space" Datatypes.)

Two less-obvious syntax components are pertinent to UCO:

The OWL 2 to RDF mapping defines the required syntax (Section 3.2.4, Table 12, row 4) for the enumeration-based lexical-space rdfs:Datatype:

_:x rdf:type rdfs:Datatype . _:x owl:oneOf T(SEQ lt1 ... ltn) . { n ≥ 1 }

UCO's vocabularies are currently incorrect in two manners according to this syntax:

  1. The subject-node bearing owl:oneOf is an IRI, not a blank node.
  2. No distinction is made between value space and lexical space.

Risk / Benefit analysis

Benefits

Risks

Competencies demonstrated

Competency 1

A general OWL 2 consumer is interested in seeing all literals that are members of enumerated vocabularies.

Competency Question 1.1

What datatypes are based on fixed sets of literals, and what are their members?

SELECT ?nDatatype ?lValue
WHERE {
  ?nDatatype
    a rdfs:Datatype ;
    owl:equivalentClass ?nLexicalValueSpace ;
    .

  ?nLexicalValueSpace
    a rdfs:Datatype ;
    owl:oneOf/(rdf:rest*)/rdf:first ?lValue ;
    .
}

Result 1.1

Run against UCO 0.9.0, these are the results of that query:

?nDatatype ?lValue
0 https://ontology.unifiedcyberontology.org/uco/observable/NetworkSocketAddressFamily af_appletalk
1 https://ontology.unifiedcyberontology.org/uco/observable/NetworkSocketAddressFamily af_bth
2 https://ontology.unifiedcyberontology.org/uco/observable/NetworkSocketAddressFamily af_inet
3 https://ontology.unifiedcyberontology.org/uco/observable/NetworkSocketAddressFamily af_inet6
4 https://ontology.unifiedcyberontology.org/uco/observable/NetworkSocketAddressFamily af_ipx
5 https://ontology.unifiedcyberontology.org/uco/observable/NetworkSocketAddressFamily af_irda
6 https://ontology.unifiedcyberontology.org/uco/observable/NetworkSocketAddressFamily af_netbios
7 https://ontology.unifiedcyberontology.org/uco/observable/NetworkSocketAddressFamily af_unspec
8 https://ontology.unifiedcyberontology.org/uco/observable/NetworkSocketProtocolFamily pf_appletalk
9 https://ontology.unifiedcyberontology.org/uco/observable/NetworkSocketProtocolFamily pf_ash
.. (snip) ...

Nothing from the vocabulary namespace is yet returned.

Solution suggestion

Coordination

sbarnum commented 2 years ago

The actual PR of the proposed solution for this issue contained a significant and widespread change that was not identified or addressed in the actual change proposal. The CP identified needed changes to the vocabulary datatype definitions in the vocabulary namespace. These were clearly identified and explained in the CP and were widely discussed and agreed to in the ontology committee. The solution PR included the additional significant change of duplicating out all vocabulary value lists from not only their position within the vocabulary datatype definition (as identified in the CP) but also into the property shapes for EVERY point of use for any vocabulary throughout all of UCO. The current approach in UCO supported simple maintenance and evolution of vocabularies as they are all located in one namespace so you always know where to find them and their value space are defined in one place (their datatype definition) so that changes/additions/deletions can be made in one place eliminating the risk of conflicting definitions due to changes being made in some places and not in others. The proposed changes in the PR eliminate this practicality. With the proposed change there is now significantly more work required and risk imparted to manage vocabularies. Now for any changes/additions/deletions the entire ontology must be searched and ALL occurrences must be maintained consistently. This becomes increasingly problematic when application domain ontologies using UCO utilize UCO vocabularies resulting in required consistency across multiple scopes of authoritative control. OWL DL is a significant objective to shoot for but should not outweigh the critical needs for practicality in the development and use of UCO. Given the choice between the two, practicality should always win. This is specifically codified as a foundational principle for UCO within the CDO charter. I see no issues with the proposed changes at the vocabulary datatype definitions but the proliferation of vocabulary value duplication should not be pursued even if it is at the expense of full OWL DL conformance.

This issue was voted on at the August 9th ontology committee meeting and passed over my strong objections. I do not believe that the ontology members voting in the affirmative fully comprehend the potential impact of this change and I think this will likely come back to bite us at a later date where it will be much more difficult to reverse.

This comment is added as a record of these objections and their rationale.

ajnelson-nist commented 2 years ago

@sbarnum Thank you for logging your objection.

Slight correction - your comment should have been posted on Issue #406 , not 435.

I agree with this effect on RDF Lists being a significantly unpleasant consequence of OWL 2 DL conformance.

However, the Ontology Committees were warned several times that the engineering convenience of making the shared -members rdf:List concepts was a convenience made in spite of a known potential incompatibility with OWL 2 DL. The first warning was explicit, recorded in Proposal 100, and called out in the meeting just before that proposal was voted on. The committee was warned in the Risks of the Issue page of 406 that, having found the explicit citation in the OWL 2 to RDF mapping, that convenience was being rolled back. And, the first commit documented that the unpleasant change was being carried out.

This is an unfortunate interaction of OWL and SHACL. However, be aware also that the "Semi-open vocabulary" design of UCO is, by my experience within the ontology space, possibly a novel feature, and I suspect is certainly a novel implementation as it is now. The most graceful implementation of the goal of semi-open vocabularies might not be lists of strings. The MIME Taxonomy proposal may lead us to an alternative implementation that still allows for user extensibility with gentle warnings about set-membership; allows definitions of semantics for the members; and further, doesn't itself encourage any of our committee members to jettison our underlying standards for the sake of a "foundational principle" that is meeting harsh realities of technology implementation and interoperability of predating standards.

The design document itself is still undergoing review and testing. The lack of workflow time being devoted to it should not be mistaken for acceptance of the document as a whole.

ajnelson-nist commented 2 years ago

This proposal is ready for a Solutions Approval vote on 2022-08-25. Note that there was a procedural error in processing this proposal, and it was prematurely merged into UCO's develop. Also, an effect realized on a vocabulary in CASE c/o a new test in this proposal brought to light that CASE had fallen behind on implementing upstream UCO features. The meeting on Thursday will cover updates to the review process implemented to handle both of these matters.