ucoProject / UCO

This repository is for development of the Unified Cyber Ontology.
Apache License 2.0
76 stars 34 forks source link

Remove core:type #433

Closed ajnelson-nist closed 1 year ago

ajnelson-nist commented 2 years ago

Background

UCO defines a concept called "type" that demonstrates no benefit beyond the concept of "type" that is defined in, and underpins, RDF. Furthermore, UCO defines core:type as a datatype property with a range of only Literals with type xsd:string.

UCO should not reinvent this concept, especially not at risk of causing OWL 2 DL incompatible property-type confusion errors. (OWL 2 DL separates properties into Datatype properties, that have range of Literals, Object properties that have range of non-Literal nodes, and Annotation properties that can be either but do not influence inferencing mechanisms.)

Requirements

Requirement 1

Remove core:type.

Risk / Benefit analysis

Benefits

Risks

No risk is known to the proposer, on account of no observed usage in the UCO or CASE ontologies, or in any CASE example. The only observed display of core:type is in a UCO design document section, within this figure:

object-example-rdf-graph drawio

The illustration depicts core:type being the IRI of the concept. This is redundant with the class definition.

The proposer is aware of early UCO notions of core:type being able assist in defining JSON-LD constructs, particularly around the @type keyword. This is a problem to solve in engineering and documentation that is specific to JSON-LD, not through concept definition in the serialization-independent ontology.

Competencies demonstrated

Competency 1

Note that core:type is not required to answer these competencies.

Competency Question 1.1

What are all of the types of the node kb:x-1?

Result 1.1

See all returned values of ?nType.

SELECT ?nType
WHERE {
  kb:x-1 a ?nType .
}

Competency Question 1.2

What are all of the non-UCO types used within this knowledge base?

Result 1.2

See all returned values of ?nType.

SELECT DISTINCT ?nType
WHERE {
  ?nNode a ?nType .
  FILTER (
    !regex(
      STR(?nType),
      "ontology.unifiedcyberontology.org",
      ""
    )
  )
}

Solution suggestion

Coordination

sbarnum commented 2 years ago

The core:id and core:type properties are not extraneous or locally invented concepts in UCO.

An id and a type are necessary for every object in the graph for the graph to cohere and have integrity. This holds true for all serializations of UCO. If any given serialization dropped either of these from any object then its content would not be able to be deserialized or cross-serialized to another serialization with any integrity.

It is true that RDF serializations such as JSON-LD that are inherently graph-based recognize this requirement and enforce it by default. However, the implicit mapping of the @type to the owl:Class of the object is a JSON-LD binding rule rather than anything explicit in the ontology itself. And there is no mapping whatsoever of the @id to anything specific in the ontology. It is simply required by RDF serializations. A producer must explicitly assert it or in the case of "blank nodes" the rdf processor will autogenerate something locally (but not globally) unique. Again, this is all part of the rdf serialization side and not the ontology side.

For any non-rdf based serializations we cannot presume such binding rules apply. Serializing to YAML, for example, has no requirements for id or type on objects and if the ontology did not provide the core:id and core:type properties it would be unclear and impractical to recognize that the objects could/should be adorned with them.

For JSON-LD serialization bindings the core::id property is serialized as @id and the core:type property is serialized as @type so you do not have to have @id & @type properties as well as duplicate core:id and core:type properties within the object. This is why core:id currently has a maxCount=1 but neither has a minCount to enable non-duplicative use for serializations like JSON-LD.

Net-Net is that we cannot presume that just because one serialization form (JSON-LD), even if it is our default form, handles id & type implicitly that other serializations don't require an explicit codification of these properties in the ontology. core:id and core:type are relevant and necessary.

plbt5 commented 2 years ago

@sbarnum your statement "An id and a type are necessary for every object in the graph for the graph to cohere and have integrity." cannot be correct, because if it would be correct then all RDF graphs would be incoherent and without integrity. I think that you mean that the trans-serialisation of JSON (not -LD) into RDF will result in a graph that is incoherent and without integrity.

The fact that JSON is difficult to deserialise into an RDF graph was the driver for its -LD extension. In that process, the @type became to mean the ontological concept with which the data value is to be associated when trans-serialising it into RDF. Or, in other words, JSON does not have a means to identify to which ontological concept the data belongs, so they added that information.

This will be true for any other serialisation; most of them, if not all, do not have the language expressiveness to associate the data with an ontological concept. That is a limitation of the serialisation language, not of the ontology. Indeed, the work around to still use that language to trans-serialise your data into RDF, is to add nuts and bolts that you can use in the trans-serialising component. However, those nuts and bolts need to be added to your serialised data to compensate for the limitations of the language itself. There is no point in adding that information to the ontology, because that already knows about the defined concept that you want to associate the data with.

Having said that, I consider the use of a serialisation language that does not support trans-serialisation into RDF the wrong technology to apply: if one only has got a hammer, everything looks like a nail. I'd rather apply a screwdriver for the screws.

ajnelson-nist commented 2 years ago

Solution has been evaluated in PR #459 .