sbarnum commented 1 year ago

Background

The following excerpted portion of the UCO Design Document (https://unifiedcyberontology.org/resources/uco_design_document.html) provides a summary overview of the various types of classes in UCO and how they work together.

"In the UCO RDFS/OWL/SHACL ontology, classes are defined for any relevant domain concept as well as for any structured concept characterizing some aspect of a domain concept. These are structured concept classes that specify into UcoObject classes, Facet classes and other classes. UcoObject and Facet classes therefore are structured concept classes, however, UcoObject classes and Facet classes are disjoint from each other. Moreover, Facet classes inhere in UcoObject classes; this implies that for a facet concept to exist, it is dependent on the existence of the UcoObject concept that bears the facet. For example, when destroying a red car, the car as bearer for the red color is removed and with it, its red color disappears. Note that the reverse is not true; UcoObjects are not existentially dependent on facets, and, thus, cannot inhere in them. Note further that, although the example suggests that facets are compulsary for UcoObject concepts, this is not the case. Domain concept classes (e.g., File, Action, Identity, Location, Device, etc.) are defined as subclasses of the UcoObject class. Facet classes characterize a particular pattern of properties that potentially apply for more than one domain class; a color, weight, an address and alike (described in Section 5 below) represent characteristics that apply not only for cars, but also for houses, persons, books and what have you. Domain concept classes represent the things whereas facet classes represent the thing’s characteristics. The disjointness between them follows from the fact that the thing can never be the same as its characteristics. All objects in UCO must specify a globally unique identifier (discussed in Section 4 below) and an assertion of the class type of the object."

The last line of the above excerpt is very important and highlights an overlooked bug in the current and past implementations of UCO. Currently, only UcoObject specifically codifies the core:id and core:type properties providing/requiring a globally unique identifier for each instance of the class. Without such a codification and requirement, subclasses of core:Facet or any other structured classes (core:ExternalReference, marking:GranularMarking, observable:MimePartType, etc) in UCO are simply treated as blank nodes with a locally (NOT globally) defined ID.

From the W3C wiki page (https://www.w3.org/wiki/BlankNodes) on blank nodes:

You can identify BlankNodes locally with a NodeId. that ID can be used to talk about the node inside your particular file/store of information, but you can't use it to ID the node externally.

This means that UCO content within a single file or produced within a single, uniform store of information has the potential to hang together in a coherent fashion but as soon as you attempt to merge or blend graphs from different files or information stores (a critical fundamental purpose for UCO) the graph falls apart as the lack of globally unique IDs on non-UcoObject class objects means that they lose coherence with the UcoObject they are part of. Local NodeIds are typically assigned by RDF processors following similar or identical algorithms for each set of content leading to a certainty of ID conflicts in merged content.

This is a critical bug that needs addressed.

Requirements

Requirement 1

Every individual instance of a UCO class must have a globally unique id

Requirement 2

Merged graphs of UCO content from different files, information stores or producers must maintain relational graph integrity where non-UcoObject class objects maintain unique and coherent relation to the UcoObjects they are an inherent part of.

Risk / Benefit analysis

Benefits

Content blended from multiple UCO graphs (a fundamental purpose of UCO) will be possible.

Risks

Increases each non-UcoObject class object by one property. Existing examples will need to be updated.

Competencies demonstrated

Competency 1

Maintain integrity of UCO content in merged graphs from multiple origins

Competency Question 1.1

Query a UcoObject containing inherent embedded class content (e.g. a File observable object containing a FileFacet with property content)

Result 1.1

Return the full UcoObject with all of the embedded (FileFacet) content with accuracy and integrity

Competency Question 1.2

Query a merged graph for multiple UcoObjects (from different origin graphs) containing inherent embedded class content.

Result 1.2

Return the full UcoObject swith all of the embedded (FileFacet) content with accuracy and integrity

Solution suggestion

Create new core:ClassBase class in the core namespace
Move SHACL property shapes for core:id and core:type from the core:UcoObject class to the core:ClassBase class
Modify core:UcoObject to be subclass of core:ClassBase
Modify core:Facet to be subclass of core:ClassBase
Modify all classes in UCO with no superclass other than owl:Thing to be a subclass of core:ClassBase
Modify all SHACL property shapes for ObjectProperties to utilize sh:nodeKind sh:IRI rather than sh:nodeKind sh:BlankNodeOrIRI
Modify the disjoint statement between core:UcoObject and core:Facet to include all of the other sibling direct subclasses of core:ClassBase

[]
    a owl:AllDisjointClasses ;
    owl:members (
        array:ArrayOfAction
        tool:BuildConfigurationType
        # ... there are actually quite a lot ...
        core:Facet
        core:UcoObject
        # ...
    ) ;
    .

This proposed solution of utilizing a defined common base class for all UCO classes to specify the required globally unique ID for all classes is cleaner than simply adding core:id and core:type to each of the non-UcoObject classes in UCO. It is also easier to maintain and provides better coherence to the UCO class tree and cleans up much of the current messiness in the class hierarchy.

Examples

This simple example is from the same Section 3 of the UCO Design Document as the excerpt quoted in the Background section above:

{
  "@graph": [
    {
      "@id": "kb:person-952c09ff-5a38-483b-9dcf-6d8f0b27dfac",
      "@type": "identity:Person",
      "core:objectCreatedTime": {
        "@type": "xsd:dateTime",
        "@value": "2017-06-25T12:12:12.12Z"
      },
      "core:name": "John Smith",
      "core:hasFacet": [
        {
          "@id": "kb:5ecfbe78-e7c7-4b23-97fd-5ede9cc32123",
          "@type": "identity:SimpleNameFacet",
          "identity:givenName": "John",
          "identity:familyName": "Smith"
        }
      ]
    },
    {
      "@id": "kb:relationship-cecfbe8c-8357-4105-b448-b491177fedf2",
      "@type": "core:Relationship",
      "core:kindOfRelationship": "located-at",
      "core:source": "kb:person-952c09ff-5a38-483b-9dcf-6d8f0b27dfac",
      "core:target": "kb:location-7044bee0-d5d2-45f3-bb5d-2ced42bfd3f4"
    },
    {
      "@id": "kb:location-7044bee0-d5d2-45f3-bb5d-2ced42bfd3f4",
      "@type": "location:Location",
      "uco-core:hasFacet": [
        {
          "@id": "kb:69e9fe37-f2ee-435b-998f-7b1b0d60a405",
          "@type": "location:SimpleAddressFacet",
          "location:locality": "New York City",
          "location:region": "New York",
          "location:country": "USA",
          "location:street": "5th Ave"
        }
      ]
    }
  ]
}

Coordination

Tracking in Jira ticket OC-152 and OC-200
[x] Administrative review completed
[x] Requirements to be discussed in OC meeting, 2022-08-16
[x] Requirements Review vote occurred, passing, on 2022-08-16
[x] Requirements development phase completed.
[x] Solution announced to OCs on 2022-08-24
[x] Solutions Approval to be discussed in OC meeting, 2022-08-25
[x] Issue 470 resolved.
[x] Solutions Approval vote occurred, passing, on 2022-08-25
[x] Solutions development phase completed.
[x] Implementation for UCO merged into develop
[x] Implementation for CASE merged into develop
[x] Milestone linked
[x] Documentation logged in pending UCO release page
[x] Documentation logged in pending CASE release page

ajnelson-nist commented 1 year ago

I believe this proposal is strategically wrong and will file two proposals correcting underlying issues.

The short is core:id and core:type must be deleted due to conflicts with core RDF.

ajnelson-nist commented 1 year ago

Looking again, I now think only the parts of this proposal pertaining to core:id and core:type are wrong, on account of my belief that core:id and core:type are wrong to include in UCO at all. I am drafting those proposals still.

However, there is another piece that I think is missing from your solution suggestion. We allow sh:nodeKind sh:BlankNodeOrIRI on all of our object properties. I think this proposal is supposed to include instead using sh:nodeKind sh:IRI on most, if not all, of the object properties' shapes.

Last, I remember we had discussed this before in Jira, and I had asked you for an example and you might not have gotten a notice of the Jira comment. How would you represent a file that has a hash? I think that is going to be an essential sanity-check.

ajnelson-nist commented 1 year ago

@sbarnum : Also, if the top-most class in UCO would now be core:ClassBase, we should expand the disjoint statement between core:UcoObject and core:Facet to cover the other sibling subclasses of core:ClassBase. E.g., this axiom should now be included in core::

[]
    a owl:AllDisjointClasses ;
    owl:members (
        array:ArrayOfAction
        tool:BuildConfigurationType
        # ... there are actually quite a lot ...
        core:Facet
        core:UcoObject
        # ...
    ) ;
    .

It's actually a bit of a surprise when looking at what Protege displays as subclasses of owl:Thing.

sbarnum commented 1 year ago

@ajnelson-nist Good catch on changing sh:nodeKind sh:BlankNodeOrIRI to sh:nodeKind sh:IRI on ObjectProperty SHACL shapes. I had missed that implication.

Here is an example of a file with a hash:

{
  "@id": "kb:file-a0a69ece-da9c-4256-a9a8-5dec82a4ad1f",
  "@type": "uco-observable:File",
  "uco-core:hasFacet": [
    {
      "@id": "kb:ContentDataFacet-1e54fa5e-1399-476c-8aa7-00781b8c12db"
      "@type": "uco-observable:ContentDataFacet",
      "uco-observable:hash": [
        {
          "@id": "kb:hash-87c24a7f-a0d2-41a3-a726-0521a5c7bc8c",
          "@type": "uco-types:Hash",
          "uco-types:hashMethod": {
            "@type": "uco-vocabulary:HashNameVocab",
            "@value": "SHA256"
          },
          "uco-types:hashValue": {
            "@type": "xsd:hexBinary",
            "@value": "e5ca3be56f66200a1bb2262e948ac08dbc672bc8033c1ada743787b0c667dea6"
          }
        }
      ]
    }
  ]
}

sbarnum commented 1 year ago

I have no objections to expanding the disjoint statement to include all classes that only have owl:Thing as a superclass (i.e. add in all of the classes that are neither subclasses of UcoObject or Facet).

ajnelson-nist commented 1 year ago

FYI, the observable:hash snippet has an error - the literals (@value-bearing) must not have @id.

sbarnum commented 1 year ago

I very fundamentally disagree with the assertion to remove core:id and core:type properties. I have added a comment to the related CP explaining why. All of the rationale I have seen to date for removing them is based on a presumption that JSON-LD and other RDF serializations are the only way to serialize UCO. This has not been the case since the beginning of UCO and CASE. JSON-LD is the default serialization but UCO should support any other serialization as well.

sbarnum commented 1 year ago

FYI, the observable:hash snippet has an error - the literals (@value-bearing) must not have @id.

Oops. I got id happy. LOL>

I will fix it. thanks

sbarnum commented 1 year ago

I fixed the example to remove my extraneously added ids.

sbarnum commented 1 year ago

I updated the CP to include the changes to the ObjectProperty SHACL shapes `sh:nodeKind' and the class disjoint statement.

sbarnum commented 1 year ago

I realized that our JSON-LD context should contain the following:

"core:id": "@id",
"core:type": "@type",

Rather than

"id": "@id",
"type": "@type",

In this way the plain json cleanly aligns to the ontology as expected and the context does the work of mapping those properties to @id and @type.

We can also add any documentation we want to the json-ld context file outside of the "context" definition object that documents details of our json-ld serialization. The processor will simply ignore the extra content.

I am going to make the above change to the json-ld context proposal.

ajnelson-nist commented 1 year ago

"core:id": "@id",
"core:type": "@type",

That breaks JSON-LD if core:id and core:type are owl:DatatypePropertys.

ajnelson-nist commented 1 year ago

All of the rationale I have seen to date for removing them is based on a presumption that JSON-LD and other RDF serializations are the only way to serialize UCO. This has not been the case since the beginning of UCO and CASE. JSON-LD is the default serialization but UCO should support any other serialization as well.

In terms of what UCO has committed to developing technologically for 1.0.0, JSON-LD is in scope, and we are trying very hard for JSON that is not JSON-LD. Other non-RDF syntaxes have not been presented as specific use cases.

ajnelson-nist commented 1 year ago

"core:id": "@id",
"core:type": "@type",
That breaks JSON-LD if core:id and core:type are owl:DatatypePropertys.

Further, @type must always be interpreted as rdf:type, and @id must always be interpreted as a node identifier. I don't think you appreciate that you are proposing completely breaking RDF functionality of JSON-LD with these properties.

ajnelson-nist commented 1 year ago

Re:

        {
          "@id": "kb:hash-87c24a7f-a0d2-41a3-a726-0521a5c7bc8c",
          "@type": "uco-types:Hash",
          "uco-types:hashMethod": {
            "@type": "uco-vocabulary:HashNameVocab",
            "@value": "SHA256"
          },
          "uco-types:hashValue": {
            "@type": "xsd:hexBinary",
            "@value": "e5ca3be56f66200a1bb2262e948ac08dbc672bc8033c1ada743787b0c667dea6"
          }
        }

This @id causes me some stomach pain as a developer. A UUID for every hash algorithm-value pair? I am aware of some systems that do indexing at potentially the level of every JSON @type-bearing (non-@value-bearing) object, so I appreciate that this might be necessary. I'd really hate to make another object that stores that same hash algorithm-value pair, though. The index load would feel pretty gross.

On the brighter side, if types:Hash objects could be shared, we might actually get query-time benefits from letting users use indexing on these types:Hash nodes' identifiers. Requiring UUIDv4s would keep UCO at its current level of being able to compute matching hash values: only by full comparison of the hash string value and method.

As a summary effect: I would like observable:hash to be protected from being a owl:InverseFunctionalProperty, perhaps with something like this update, changing the comment from:

Hash values of the data.

To:

A hash value of the data. As part of UCO OWL modeling, this property is intentionally neither an owl:FunctionalProperty, nor an owl:InverseFunctionalProperty.

May we expand the scope of this proposal to include this revision to observable:hash?

sbarnum commented 1 year ago

I think I may have discovered the root of our disconnect.

I just noticed that types:Identifier is currently only defined as a generic rdfs:Datatype with no further detail. This was never the intention. It was always intended to be a Datatype constraining the value of xsd:string with a regex for our agreed form of IRI value for an object identifier. We discussed this at length a few years back and I could have sworn we added it in to the definition of types:Identifier but it is obviously not there now. I don't know if we never finished that work or if it got put in and then pulled out at some point. At a minimum the defined constraint on string should be a regex for an IRI. More specifically it should constrain it to the UCO identifier pattern we developed that ensured global uniqueness and simply supported linked-data. It was "

	?nClass
0	https://ontology.unifiedcyberontology.org/uco/action/ArrayOfAction
1	https://ontology.unifiedcyberontology.org/uco/core/ExternalReference
4	https://ontology.unifiedcyberontology.org/uco/marking/GranularMarking
5	https://ontology.unifiedcyberontology.org/uco/marking/MarkingModel
6	https://ontology.unifiedcyberontology.org/uco/observable/ContactAddress
7	https://ontology.unifiedcyberontology.org/uco/observable/ContactAffiliation
8	https://ontology.unifiedcyberontology.org/uco/observable/ContactEmail
9	https://ontology.unifiedcyberontology.org/uco/observable/ContactMessaging
10	https://ontology.unifiedcyberontology.org/uco/observable/ContactPhone
11	https://ontology.unifiedcyberontology.org/uco/observable/ContactProfile
12	https://ontology.unifiedcyberontology.org/uco/observable/ContactSIP
13	https://ontology.unifiedcyberontology.org/uco/observable/ContactURL
14	https://ontology.unifiedcyberontology.org/uco/observable/EnvironmentVariable
15	https://ontology.unifiedcyberontology.org/uco/observable/ExtractedString
16	https://ontology.unifiedcyberontology.org/uco/observable/GlobalFlagType
17	https://ontology.unifiedcyberontology.org/uco/observable/IComHandlerActionType
18	https://ontology.unifiedcyberontology.org/uco/observable/IExecActionType
19	https://ontology.unifiedcyberontology.org/uco/observable/IShowMessageActionType
20	https://ontology.unifiedcyberontology.org/uco/observable/MimePartType
21	https://ontology.unifiedcyberontology.org/uco/observable/TaskActionType
22	https://ontology.unifiedcyberontology.org/uco/observable/TriggerType
23	https://ontology.unifiedcyberontology.org/uco/observable/URLHistoryEntry
24	https://ontology.unifiedcyberontology.org/uco/observable/WhoisRegistrarInfoType
25	https://ontology.unifiedcyberontology.org/uco/observable/WindowsPEFileHeader
26	https://ontology.unifiedcyberontology.org/uco/observable/WindowsPEOptionalHeader
27	https://ontology.unifiedcyberontology.org/uco/observable/WindowsPESection
28	https://ontology.unifiedcyberontology.org/uco/observable/WindowsRegistryValue
29	https://ontology.unifiedcyberontology.org/uco/pattern/PatternExpression
30	https://ontology.unifiedcyberontology.org/uco/tool/BuildConfigurationType
31	https://ontology.unifiedcyberontology.org/uco/tool/BuildInformationType
32	https://ontology.unifiedcyberontology.org/uco/tool/BuildUtilityType
33	https://ontology.unifiedcyberontology.org/uco/tool/CompilerType
34	https://ontology.unifiedcyberontology.org/uco/tool/ConfigurationSettingType
35	https://ontology.unifiedcyberontology.org/uco/tool/DependencyType
36	https://ontology.unifiedcyberontology.org/uco/tool/LibraryType
37	https://ontology.unifiedcyberontology.org/uco/types/ControlledDictionary
38	https://ontology.unifiedcyberontology.org/uco/types/ControlledDictionaryEntry
39	https://ontology.unifiedcyberontology.org/uco/types/Dictionary
40	https://ontology.unifiedcyberontology.org/uco/types/DictionaryEntry
41	https://ontology.unifiedcyberontology.org/uco/types/Hash

ucoProject / UCO

Resolve current bug in UCO that does not require globally unique IDs for all class objects #430

Background

Requirements

Requirement 1

Requirement 2

Risk / Benefit analysis

Benefits

Risks

Competencies demonstrated

Competency 1

Competency Question 1.1

Result 1.1

Competency Question 1.2

Result 1.2

Solution suggestion

Examples

Coordination