ucoProject / UCO

This repository is for development of the Unified Cyber Ontology.
Apache License 2.0
73 stars 34 forks source link

File and URL should be designated disjoint classes #536

Closed ajnelson-nist closed 7 months ago

ajnelson-nist commented 1 year ago

Disclaimer

Participation by NIST in the creation of the documentation of mentioned software is not intended to imply a recommendation or endorsement by the National Institute of Standards and Technology, nor is it intended to imply that any specific software is necessarily the best available for the purpose.

Background

In the 2023-04-18 Ontology Committees meeting, the OCs discussed Issue 534, which in brief is about whether three observable:ObservableObject subclasses that are currently unrelated to one another could be used together to represent downloading a file from a URL with an expectation of certain hashes being computable.

One of the points that came out of the discussion was a general agreement that observable:File and observable:URL should be disjoint classes.

No commentary was made on how observable:ContentData relates or doesn't relate to either of those classes.

There also was not a suggestion on whether there is a superclass of observable:File or observable:URL that would be a more appropriate disjointedness target. But, the belief is that this specific disjointedness designation would be compatible with future modeling refinements.

Requirements

Requirement 1

UCO must prevent a user from designating a node as both an observable:File and observable:URL.

Risk / Benefit analysis

Benefits

Risks

  1. New disjointedness designations would need to be added as SHACL shapes with sh:Warning severity for UCO 1.x.0, and could be designated sh:Violation severity only in a future major release. This is believed low-risk, as the practice is being exercised in other proposals currently.
  2. This restriction on typing does nothing to resolve whether it is appropriate to continue duck-typing an individual node as like a file and like a URL by giving the node a observable:FileFacet and observable:URLFacet. UCO Facets still permit this, and no policy in English, OWL, or SHACL disallows it.
  3. This proposal sidesteps the original question of how to associate "Expected" hashes with a URL that is expected to provide a file.
  4. This restriction lacks modeling rationale stated beyond the OCs' intuition. The discussion in the meeting included asides like "A URL is more an address, or locator, which a file isn't." While this aligns with intuition, for reasons unclear to the proposer, observable:URL is not currently a subclass of observable:Address. Was this an oversight? If so, is it appropriate to add to UCO these statements: observable:Address owl:disjointWith observable:File . and observable:URL rdfs:subClassOf observable:Address .?

Competencies demonstrated

Competency 1

A user is trying to represent a downloadable file. (This is compiled and excerpted from the same example data in #534.)

<https://files.pythonhosted.org/packages/d4/f9/28260b3e9335605ac2093779e9780acaaba2c0794a47a53822a0c98e52d9/case_utils-0.10.0-py3-none-any.whl>
    a
        uco-observable:ContentData ,
        uco-observable:File ,
        uco-observable:URL
        ;
    uco-core:hasFacet
        kb:ContentDataFacet-2e1a9cee-1353-471d-b318-92fc9da7280b ,
        kb:FileFacet-82fd5577-bed0-4f7f-ba3f-08d3583c2efb ,
        kb:URLFacet-a78e2688-44b8-4eb9-b474-33c5e2b3c32a
        ;
    .

kb:ContentDataFacet-2e1a9cee-1353-471d-b318-92fc9da7280b
    a uco-observable:ContentDataFacet ;
    uco-observable:hash kb:Hash-cb51e845-086c-43a7-99ef-6d44569e2143 ;
    uco-observable:sizeInBytes 537812 ;
    .

kb:FileFacet-82fd5577-bed0-4f7f-ba3f-08d3583c2efb
    a uco-observable:FileFacet ;
    uco-observable:fileName "case_utils-0.10.0-py3-none-any.whl" ;
    uco-observable:sizeInBytes 537812 ;
    .

kb:Hash-cb51e845-086c-43a7-99ef-6d44569e2143
    a uco-types:Hash ;
    uco-types:hashMethod "SHA256"^^uco-vocabulary:HashNameVocab ;
    uco-types:hashValue "daf617d96b1dc74b2953f82067365b1858cbe0e9d4a9d2659091f23951129bc1"^^xsd:hexBinary ;
    .

kb:URLFacet-a78e2688-44b8-4eb9-b474-33c5e2b3c32a
    a uco-observable:URLFacet ;
    uco-observable:fullValue "https://files.pythonhosted.org/packages/d4/f9/28260b3e9335605ac2093779e9780acaaba2c0794a47a53822a0c98e52d9/case_utils-0.10.0-py3-none-any.whl" ;
    .

Competency Question 1.1

Is this conformant UCO data? Should it be?

Result 1.1

In UCO 1.2.0, yes this is conformant; but per this proposal, no, it should not be, because the URL should not be considered to be a file. This situation is flaggable with this constraint being added to observable:File:

observable:File
    sh:not [
        a sh:NodeShape ;
        sh:class observable:URL ;
    ] ;
    .

(That constraint would work, but in an oversimplified manner; the solution description section provides a fuller implementation and rationale.)

Competency Question 1.2

Before any download action takes place from that files.pythonhosted.org URL, what is the association between the hash daf617d... and the URL https://files.pythonhosted.org/packages/d4/f9/28260b...?

Result 1.2

The answer to this question is out of scope of this proposal.

Suggestions are welcome, but likely need to be part of future proposal(s). The proposer has in mind a potential solution based on Qualities that might also be of interest to the Adversary Engagement Ontology.

Solution suggestion

First, designate with OWL that observable:File and observable:URL are disjoint by adding this one triple:

observable:File
    owl:disjointWith observable:URL ;
    .

Then, a new shape specialized to the pairwise disjointedness of observable:File and observable:URL:

observable:File-disjointWith-URL-shape
    a sh:NodeShape ;
    sh:message "observable:File and observable:URL are disjoint classes."@en ;
    sh:not [
        a sh:NodeShape ;
        sh:class observable:URL ;
    ] ;
    sh:targetClass observable:File ;
    .

Solution discussion

The reasons for adding a shape specialized to the pair are for (1) shape performance, and (2) deprecation management.

First, on shape performance: It is possible to use a general-purpose "Find all disjoint-set members" SPARQL query that would work across all OWL usage. One has been used in CASE-Corpora for some months, defined here, and it has assisted with finding modeling errors by only needing a sole owl:disjointWith statement to be added to an ontology. However, to use that shape, some degree of inferencing (/graph expansion) is required, either RDFS- or OWL-based. And further, this is reliant on a SPARQL engine's performance capabilities.

Second, on deprecation management: Recently, CDO shapes repositories have been begun to explore potential concurrent usage of other ontologies with UCO. The Friend-of-a-Friend shapes repository, used in the UCO FOAF Profile, handles these disjointedness statements, which are all of the disjointWith occurrences in FOAF:

foaf:Document
    owl:disjointWith
        foaf:Organization ,
        foaf:Project
        ;
    .

foaf:Organization
    owl:disjointWith
        foaf:Document ,
        foaf:Person
        ;
    .

foaf:Person
    owl:disjointWith
        foaf:Organization ,
        foaf:Project
        ;
    .

foaf:Project
    owl:disjointWith
        foaf:Document ,
        foaf:Person
        ;
    .

Note that not all the classes mentioned are disjoint with all of the other classes. For instance, it is conformant with FOAF to have a node that is both a foaf:Organization and foaf:Project, despite both those classes being disjoint with foaf:Document.

An initial draft of the shape to represent Documents being disjoint with Organizations and Projects looked like this:

sh-foaf:Document-disjointedness-shape
    a sh:NodeShape ;
    sh:message "foaf:Document is a disjoint class with foaf:Organization and foaf:Project."@en ;
    sh:not [
        a sh:NodeShape ;
        sh:or (
            [
                a sh:NodeShape ;
                sh:class foaf:Organization ;
            ]
            [
                a sh:NodeShape ;
                sh:class foaf:Project ;
            ]
        ) ;
    ] ;
    sh:targetClass foaf:Document ;
    .

(The nested sh:or is because SHACL requires that a single sh:NodeShape not have two values of sh:not.)

Instead, these shapes were implemented, copied here:

sh-foaf:Document-disjointWith-Organization-shape
    a sh:NodeShape ;
    sh:message "foaf:Document and foaf:Organization are disjoint classes."@en ;
    sh:not [
        a sh:NodeShape ;
        sh:class foaf:Organization ;
    ] ;
    sh:targetClass foaf:Document ;
    .

sh-foaf:Document-disjointWith-Project-shape
    a sh:NodeShape ;
    sh:message "foaf:Document and foaf:Project are disjoint classes."@en ;
    sh:not [
        a sh:NodeShape ;
        sh:class foaf:Project ;
    ] ;
    sh:targetClass foaf:Document ;
    .

The reasons were:

In summary, these will be added:

Coordination

sbarnum commented 1 year ago

This makes sense to me