ucoProject / UCO

This repository is for development of the Unified Cyber Ontology.
Apache License 2.0
73 stars 34 forks source link

ContextualCompilation should remove its 1-member minimum #599

Closed ajnelson-nist closed 2 months ago

ajnelson-nist commented 4 months ago

Background

core:ContextualCompilation is a class for representing a set of objects sharing some context. A contextual compilation has a standing expectation that the set have at least one member; however, from experience with provenance tracking in CASE and some other representations, this expectation might be inappropriate to impose. There is also some debate whether the definition in the rdfs:comment includes this expectation.

The definition currently reads:

A contextual compilation is a grouping of things sharing some context (e.g., a set of network connections observed on a given day, all accounts associated with a given person).

Under one reading of that definition, there exist things that are grouped, and thus there are at least two members in the compilation.

However, under another reading that focuses on "grouping" rather than "of things," this definition describes a set, and the set could be empty.

The second reading has arisen as a necessary use case to support, demonstrated in part by CASE's investigation:ProvenanceRecord, which collects results of an investigation:InvestigativeAction. What was not originally appreciated in CASE and UCO design is that there are some actions that truly have no results, e.g., "Hash all of the files in this directory that happens to have no files." Other use cases have come up through representing data sets in CASE-Corpora; without going into detail, they provide other examples where there is a "grouping of things" that exists but, for one reason or another, the things are not linkable.

Another use case to support comes in the context of information sharing. There can arise situations where two parties need to be able to discuss one set, sharing its IRI between themselves, but for some reasons the members of the set cannot be shared. For instance, sharing the set-members might be temporarily or permanently legally restricted, but the set's identity might need to be acknowledged in shared data. With UCO's current implementation of core:ContextualCompilation, attempting to share a set-identifier in such a manner would lead to the transported graph always failing SHACL validation because of a core:ContextualCompilation with no members, and that SHACL validation would propagate to any graphs the receiver has.

The SHACL-specific issue is that there was a translation from an open-world description of core:ContextualCompilation, spelled like this in UCO 0.6.0:

        [
            a owl:Restriction ;
            owl:onProperty core:object ;
            owl:onClass core:UcoObject ;
            owl:minQualifiedCardinality "1"^^xsd:nonNegativeInteger ;
        ]

In UCO 0.7.0, that owl:Restriction was translated to this:

    [
        sh:class core:UcoObject ;
        sh:minCount "1"^^xsd:integer ;
        sh:nodeKind sh:BlankNodeOrIRI ;
        sh:path core:object ;
    ]

What previously read "A core:ContextualCompilation has at least one core:UcoObject linked by core:object; but it's not necessary to record it, just know that it exists" became "A core:ContextualCompilation must have at least one core:UcoObject linked by core:object, else the data is non-conformant with UCO."

Requirements

Requirement 1

It must be possible to represent a contextual compilation of objects that share some context, and for the set to have no members.

Risk / Benefit analysis

Benefits

Risks

core:ContextualCompilation
    sh:property [
        sh:minCount 1 ;
        sh:path core:object ;
    ] ;
    .

Competencies demonstrated

Competency 1

Suppose there is a general form of CASE investigation:InvestigativeAction that hashes files in a directory: it takes a observable:Directory as input and emits observable:ContentData objects as output, as well as a investigation:ProvenanceRecord (subclass of core:ContextualCompilation). Hashes can then be returned with this SPARQL query:

SELECT ?lHashValue
WHERE {
  ?nHashAction
    action:result
      ?nProvenanceRecord ,
      ?nContentData
      ;
    .
  ?nProvenanceRecord
    uco-core:object ?nContentData ;
    .
  ?nContentData
    a observable:ContentData ;
    core:hasFacet / observable:hash / types:hashValue ?lHashValue ;
    .
}

Suppose it is run against a directory with no files.

kb:Action-1
    a case-investigation:InvestigativeAction ;
    uco-core:description "Hash directory contents" ;
    uco-action:object kb:Directory-2 ;
    uco-action:result kb:ProvenanceRecord-3 ;
    .
kb:ProvenanceRecord-3
    a
        co:Set ,
        case-investigation:ProvenanceRecord
        ;
    .
    co:size 0 ;
    .

Competency Question 1.1

What hashes were yielded by reviewing this directory?

Result 1.1

None. And, with adoption of this proposal, the input graph is UCO-conformant and CASE-conformant, with Collections Ontology extension.

Solution suggestion

In light of the need to represent empty core:ContextualCompilation, this proposal only removes the minimum-1 count constraint.

There was some consideration of whether it would be necessary to restore an owl:Restriction, but this proposal effectively makes core:object an entirely optional property on core:ContextualCompilation. Further, the prior owl:Restriction's use of a qualified cardinality was redundant with the rdfs:range of core:object. So, it seems that an owl:Restriction for general usage on core:ContextualCompilation would have no effect.

Coordination

ajnelson-nist commented 4 months ago

It would be fair to discuss whether this is a backwards-compatible change. I believe it is, because graphs that currently pass UCO SHACL validation will continue to do so.

sbarnum commented 2 months ago

I agree that this would be a backward compatible change and support it.