ucoProject / UCO

This repository is for development of the Unified Cyber Ontology.
Apache License 2.0
78 stars 34 forks source link

Introducing disjointedness between Information and Non-Information Resources #619

Open plbt5 opened 1 month ago

plbt5 commented 1 month ago

[!NOTE]

(Submitted by @plbt5 and @ajnelson-nist.)

This proposal is split off from Issue 606. This proposal does not address any of the UUID discussion from 606.

Background

UCO does not currently account for explicitly representing the distinction between a physical resource that extends in time and space, like a device or a person, and a digital (web) resource that only lives in the cyber-domain, e.g., <https://caseontology.org/index.html>. Since both are clearly disjoint from each other, and because there are many objects in UCO that are either one or the other, such disjointedness must be specified case by case. See for instance Issue #536 , which partially addressed a question around a graph-individual representing a downloadable file (e.g., <http://example.org/file.zip>).

We remind us of the distinction that has already been identified by RDF between information and non-information resources. This ended up in RFC9110 HTTP Semantics. We have depicted their application and distinctions in Figure 1 below:

Distinction between URIs/URN/URL and Information Resources versus Non-information Resources Figure 1 - Information and non-information resources: their relationship and differences

(Note: For the purposes of this proposal, please consider URI and IRI as synonymous.)

The distinction between an Information Resource (IR) and Non-Information Resource (NIR) cannot be determined from the URI itself but from the response that one gets from the server. If the URI concerns a NIR the server cannot respond with data because there does not yet exists something like Elephant-Over-IP or Paul-Over-IP a.k.a. "Beam me up, Scotty" in the protocols. Instead, the server will respond with a HTTP-303 status, redirecting to a URI that is an Information Resource. Visiting the NIR thus discloses information about the NIR as opposed to the real thing itself.

This kind of behavior of the webserver leaves the determination about whether a resource is a NIR or an IR as a matter of perception by the client. For instance, some services may differentially serve a page to some users, but not others, like with an international hotel that gives its home page to in-country visitors, but a language-specific page for external-appearing visitors. This is a case where the home page is perceived as an IR to in-country visitors, and a NIR to out-of-country visitors. One graph holding perspectives from multiple geographies must be able to tolerate a resource being IR and NIR.

Meanwhile, other RDF resources encoded in a graph remain truly in a set of concepts that will never be information resources, such as people or devices. Hence, we find a need for specializing non-information resource further with a class of things that will never be information resources.

This distinction is instrumental for a lot of things that are built with RDF(S) and OWL, and it is something that UCO should at least recognize as current practice.

Requirements

Requirement 1

Allow UCO to unequivocally determine in a graph whether a resource is either never an information resource or possibly an information resource.

Requirement 2

A single web resource MUST be able to be represented as an IR and/or an NIR as appropriate at different situations, e.g., due to perception about authorization, location, specific targeting and more.

A resource can be both an IR and a NIR because it can be perceived as an IR or NIR depending on constraints or business rules as implemented by the server, e.g., serving pages in different languages when requested from different geographical locations.

(Proposal flow note: This proposal suggests a solution and competencies before providing a risk/benefit analysis.)

Solution suggestion

The implementation would first need to introduce the distinction between Non-Information and Information Resources. This would become two additional near-top-level classes, under core:UcoThing. This would be a nod to the concepts really being RDFS concepts, but not defined with RDFS IRIs. We should also avoid entailing RDFS semantics of rdfs:Resource being the top-level class, because of the tension such would create with OWL and owl:Thing being the top-level class.

Next, another distinction should be introduced to acknowledge Never-Information Resources, and these being disjoint with the IR and NIR. This allows UCO to follow the reality where an IR can change into an NIR, as explained in Competency 1.

To that end, we suggest to introduce the following concepts in UCO:

core:NonInformationResource 
    rdfs:subClassOf core:UcoThing ;
    .
core:InformationResource 
    rdfs:subClassOf core:UcoThing ;
    .
core:NeverInformationResource 
    rdfs:subClassOf core:NonInformationResource ;
    owl:disjointWith core:InformationResource ;
    .

We also introduce observable:WebResource as a parent to observable:WebPage, to acknowledge web resources that are not yet known to be an IR or NIR, and, to acknowledge Webpages that are always to be considered an IR:

observable:WebResource
    rdfs:subClassOf observable:ObservableObject ;
    .
observable:WebPage
    rdfs:subClassOf observable:WebResource ;
    rdfs:subClassOf core:InformationResource ;
    .

Visually, this renders as follows, with green nodes new classes, and the red link a new disjointedness:

flowchart BT

  core_UcoThing[core:UcoThing]
  core_InformationResource[core:InformationResource]
  core_NonInformationResource[core:NonInformationResource]
  core_NeverInformationResource[core:NeverInformationResource]
  core_UcoObject[core:UcoObject]
  core_Item[core:Item]
  observable_Observable[observable:Observable]
  observable_ObservableObject[observable:ObservableObject]
  observable_WebResource[observable:WebResource]
  observable_WebPage[observable:WebPage]

style core_InformationResource stroke:#0f0;
style core_NeverInformationResource stroke:#0f0;
style core_NonInformationResource stroke:#0f0;
style observable_WebResource stroke:#0f0;

core_InformationResource -- ⊂ --> core_UcoThing
core_NonInformationResource -- ⊂ --> core_UcoThing
core_NeverInformationResource -- ⊂ --> core_NonInformationResource
core_InformationResource x-- ⋂=∅ --x core_NeverInformationResource
linkStyle 3 color:red,stroke:red;

core_UcoObject -- ⊂ --> core_UcoThing
core_Item -- ⊂ --> core_UcoObject
observable_Observable -- ⊂ --> core_UcoObject
observable_ObservableObject -- ⊂ --> core_Item
observable_ObservableObject -- ⊂ --> observable_Observable
observable_WebResource -- ⊂ --> observable_ObservableObject
observable_WebPage -- ⊂ --> core_InformationResource
observable_WebPage -- ⊂ --> observable_WebResource

Apart from the above additions to UCO, we suggest to perform an initial alignment. The Risks section should make clear the benefit of such alignment, particularly pertaining to some existing practices (outside of UCO) on designating graph nodes with RDF types analogous to UCO's identity:Person and observable:WebPage. The rationale followed is - can this owl:Thing ever be downloaded with some browser or command-line tool?

action:Action
    rdfs:subClassOf core:NeverInformationResource ;
    .
core:Event
    rdfs:subClassOf core:NeverInformationResource ;
    .
core:UcoInherentCharacterizationThing
    rdfs:subClassOf core:NeverInformationResource ;
    .
identity:Organization
    rdfs:subClassOf core:NeverInformationResource ;
    .
identity:Person
    rdfs:subClassOf core:NeverInformationResource ;
    .
observable:Device
    rdfs:subClassOf core:NeverInformationResource ;
    .
observable:URL
    rdfs:subClassOf core:NeverInformationResource ;
    .

Competencies demonstrated

Competency 1

Say the webpage of a multilingual company (MC) is being accessed by two market analysts in a multinational organization, who routinely contribute to a shared knowledge base in the organization. Their offices are in different countries that happen to use languages MC supports, Japan and France. MC's default language is Japanese.

The Japanese analyst visits the home page, https://mc.example.co.jp/, and is served content from that URL. The French analyst visits the home page, https://mc.example.co.jp/, and is 303-redirected to https://mc.example.co.jp/lang-fr/ by server-side client-geolocation rules.

Neither analyst knows the other is trying to access https://mc.example.co.jp/.

Competency Question 1.1

What are the representations of the Japanese analyst and the French analyst, using InformationResource, NonInformationResource, NeverInformationResource, WebResource, and/or WebPage?

Result 1.1

The Japanese analyst:

<https://mc.example.co.jp/>
    a observable:WebPage ;
    .

The French analyst:

<https://mc.example.co.jp/>
    a
        core:NonInformationResource ,
        observable:WebResource
        ;
    .
<https://mc.example.co.jp/lang-fr/>
    a observable:WebPage ;
    .

Even if pooled in the shared knowledge base, this total knowledge view remains consistent (i.e. does not raise SHACL validation errors).

<https://mc.example.co.jp/>
    a
        core:NonInformationResource ,
        observable:WebPage
        ;
    .
<https://mc.example.co.jp/lang-fr/>
    a observable:WebPage ;
    .

This provides an example of a web resource that is, by differential service, contingently a InformationResource and/or a NonInformationResource.

Competency Question 1.2

Are the views consistent when pooled into one graph without any notes on time of observation (i.e., does not raise SHACL validation issues)?

Result 1.2

Yes. The testing in PR 610 confirms no SHACL violations are raised. The visual display of the classes and how this example doesn't hit a class-disjointedness issue is as follows (using "⊂" for subclassing (rdfs:subClassOf), "⋂=∅" for class-disjointedness (owl:disjointWith), and "∈" for instantiation (rdf:type)).

flowchart BT

subgraph TBox
  core_UcoThing[core:UcoThing]
  core_InformationResource[core:InformationResource]
  core_NonInformationResource[core:NonInformationResource]
  core_NeverInformationResource[core:NeverInformationResource]
  core_UcoObject[core:UcoObject]
  core_Item[core:Item]
  observable_Observable[observable:Observable]
  observable_ObservableObject[observable:ObservableObject]
  observable_WebResource[observable:WebResource]
  observable_WebPage[observable:WebPage]
end

subgraph ABox
  wp1[https://mc.example.co.jp/]
  wp2[https://mc.example.co.jp/lang-fr]
end

style core_InformationResource stroke:#0f0;
style core_NeverInformationResource stroke:#0f0;
style core_NonInformationResource stroke:#0f0;
style observable_WebResource stroke:#0f0;

core_InformationResource -- ⊂ --> core_UcoThing
core_NonInformationResource -- ⊂ --> core_UcoThing
core_NeverInformationResource -- ⊂ --> core_NonInformationResource
core_InformationResource x-- ⋂=∅ --x core_NeverInformationResource
linkStyle 3 color:red,stroke:red;

core_UcoObject -- ⊂ --> core_UcoThing
core_Item -- ⊂ --> core_UcoObject
observable_Observable -- ⊂ --> core_UcoObject
observable_ObservableObject -- ⊂ --> core_Item
observable_ObservableObject -- ⊂ --> observable_Observable
observable_WebResource -- ⊂ --> observable_ObservableObject
observable_WebPage -- ⊂ --> core_InformationResource
observable_WebPage -- ⊂ --> observable_WebResource

wp1 -- ∈\n(per French analyst) --> core_NonInformationResource
wp1 -- ∈\n(per French analyst) --> observable_WebResource
wp1 -- ∈\n(per Japanese analyst) --> observable_WebPage
wp2 -- ∈\n(per French analyst) --> observable_WebPage

Competency 2

This competency gives a scenario provided as a Risk in the first version of this proposal.

There is a user interface design option available for web services that choose to provide content for browser-based users and RDF-based users. They can choose to separate the RDF individuals from the web pages documenting those individuals; or, they can choose to provide the browser-friendly contents (i.e., HTML, maybe with graphics) describing an individual at that individual's IRI.

Suppose a personnel indexing service is deployed that uses home pages as person identifiers for an example organization. Their knowledge graph is available to a graph consumer who also uses UCO, and we assume the IR/NIR/Never-IR distinction of this proposal is adopted. This statement is in the graph provided by the service:

<http://example.org/~bob>
    a foaf:Person ;
    foaf:givenName "Bob" ;
    .

And, http://example.org/~bob, when visited in a browser, is served as HTML. A crawler used by the graph consumer logs this in its knowledge graph, after stumbling on Bob's home page through an intranet traversal:

<http://example.org/~bob>
    a observable:WebPage ;
    .

Competency Question 2.1

What encodings are possible to describe the graph-individual <http://example.org/~bob>?

This question stems from UCO's demonstrations to date, and is presented to motivate the need for UCO to clarify its classes URL and WebPage in particular.

Result 2.1

  1. <http://example.org/~bob> a observable:WebPage . - The graph-individual pulls down in a browser as HTML. From the crawler's perspective, this is a WebPage.
  2. <http://example.org/~bob> a identity:Person . - The graph-individual has a type of foaf:Person in the personnel service's graph, so it feels natural to translate that statement over to UCO's identity:Person.

Unfortunately, if both of those interpretations were taken, an inconsistency would be reached: identity:Person is under core:NeverInformationResource, and observable:WebPage is under core:NeverInformationResource, entailing membership in two disjoint sets.

  1. <http://example.org/~bob> a observable:URL . - The graph-individual can be seen as describing itself. However, this is another instance of the confusion discussed in Issues #534 and #536 , which addressed modeling a URL that yields a file-download on visit. In Issue 536, a disjointedness between URL and File was adopted, but several significant questions were left unaddressed.

This proposal takes a step towards addressing the question of what higher-level classes should be made disjoint, rather than piecemeal assignment of some ObservableObject subclasses.

Competency 2.2

How can the personnel indexing service's graph integrate into the UCO-based graph?

Result 2.2

There is some challenge in integrating the personnel indexing service's graph into an environment where information resources and non-information resources are held disjoint.

Integration of such a data source would need to split the resource http://example.org/~bob into independent entities, likely with a new identity:Person node. Other assertions on Bob from the personnel graph, such as name information, would likely need to migrate into Facets defined in the UCO identity: namespace, rather than be carried over with the FOAF vocabulary. In this case, some FOAF vocabulary can still be used to preserve links.

The below graph would be derived from the personnel graph, and added to the crawler's knowledge base. The personnel graph would not be directly added.

<http://example.org/~bob>
    a observable:WebPage ;
    .
kb:Person-a3d3af3d-ea1d-47f6-bc02-ac334ded6549
    a identity:Person ;
    core:name "Bob" ;
    core:hasFacet kb:SimpleNameFacet-5e939a71-078c-4ddd-a6fe-3635288b3f24 ;
    .
kb:SimpleNameFacet-5e939a71-078c-4ddd-a6fe-3635288b3f24
    a identity:SimpleNameFacet ;
    identity:givenName "Bob" ;
    .
kb:Relationship-6c57d1cd-8a10-4163-98bd-93d3d2e15b00
    a core:Relationship ;
    core:isDirectional true ;
    core:kindOfRelationship "Has_Company_Homepage" ;
    core:source kb:Person-a3d3af3d-ea1d-47f6-bc02-ac334ded6549 ;
    core:target <http://example.org/~bob> ;
    .

# Preserve link between new Bob node and Bob's homepage with FOAF vocabulary.  Add FOAF types entailed by linking properties.
<http://example.org/~bob>
    a foaf:Document ;
    foaf:primaryTopic kb:Person-a3d3af3d-ea1d-47f6-bc02-ac334ded6549 ;
    .
kb:Person-a3d3af3d-ea1d-47f6-bc02-ac334ded6549
    a foaf:Person ;
    foaf:homepage <http://example.org/~bob> ;
    .

# Carry some of the FOAF data to UCO Person node.
kb:Person-a3d3af3d-ea1d-47f6-bc02-ac334ded6549
    foaf:givenName "Bob" ;
    .

Risk / Benefit analysis

Benefits

Adding the specialization class NeverInformationResource moves further to realizing an assumed disjunction in RFC 9110's HTTP Semantics between "Information Resource" and "Non Information Resource". In practice, InformationResource and NonInformationResource can be conflated when graphs are built from multiple perspectives. This proposal prevents some conflations that should not be possible, especially ones where physical things could accidentally be implied to be downloadable.

Aligning WebPage with a higher-level concept should bring a better understanding to how to use it. This is needed since UCO's WebPage and URL can become mixed with other concepts due to the fundamental nature of RDF being about using IRIs and UCO describing URLs. observable:WebPage has been lacking to date in UCO demonstrations, which has raised confusion in Ontology Committee calls. Chances to clarify this class should be taken.

Understanding what WebPage is and isn't may be especially important in resolving ReactionsListFacet from #374. A social media post is often viewable as a web page, so UCO usage could easily see something like this in some adopter's graph analyzing some (example) social network:

@prefix ex: <http://example.org/ontology/> .

ex:SocialCompanyPost
    a owl:Class ;
    rdfs:comment "A social media post post on the network provided by Example Social Company, Inc."@en ;
    rdfs:subClassOf
        uco-observable:WebPage ,
        uco-observable:Post
        ;
    .
ex:esciRepostOf
    a owl:ObjectProperty ;
    rdfs:domain ex:SocialCompanyPost ;
    rdfs:range ex:SocialCompanyPost ;
    .
ex:esciText
    a owl:DatatypeProperty ;
    rdfs:domain ex:SocialCompanyPost ;
    rdfs:range xsd:string ;
    .

<http://social.example.com/ExampleUser2/1722027818.0>
    a ex:SocialCompanyPost ;
    ex:esciRepostOf <http://social.example.com/ExampleUser1/1722027486.0> ;
    ex:esciText "lol" ;
    .
<http://social.example.com/ExampleUser1/1722027486.0>
    a ex:SocialCompanyPost ;
    ex:esciText "wow" ;
    .

Risks

Competency 2 illustrates a significant quality-control consideration for how to integrate data from non-UCO graphs. Agreement on fundamentals is one of the significant challenges of cross-graph interoperability.

The heuristic of "Can this ever be downloaded?" might, or might not, be a sufficient guideline for determining what would be NeverInformationResources. This could be challenging for some things where records and events are closely tied together. For instance, a Bitcoin transaction has tightly-intertwined elements of (UCO) Actions and EventRecords. The action is someone transferring coins, which would (by this proposal's action:Action alignment) be a NeverInformationResource; however, the action doesn't fully happen without the record being an InformationResource retrievable from the blockchain. This seems like a situation where it's tempting to say one "downloads the action," which the proposers assume is not a kind of statement UCO should wish to support. This particular "downloading the action" statement can be avoided by adding a specific disjointedness between action:Action and observable:EventRecord; but, the higher-order disjointedness in this proposal satisfies the same separation, stemming from actions being never-information resources, and leaving it open whether event records can be information resources.

If the alignment core:UcoInherentCharacterizationThing rdfs:subClassOf core:NeverInformationResource . is accepted, the current statement in the ontology core:UcoInherentCharacterizationThing rdfs:subClassOf core:UcoThing . becomes entailed, and no longer needs to be explicitly stated from some perspectives, including with respect to SHACL, and with respect to entailment schemes (whether RDFS or OWL). However, this divide is one of the foundational statements of UCO, that there are "domain objects" (UcoObject and subclasses) and "non-domain objects" (things that only inhere and characterize other things, and cannot exist without those other things). Removal of the triple core:UcoInherentCharacterizationThing rdfs:subClassOf core:UcoThing . makes this divide less apparent, because core:UcoInherentCharacterizationThing is no longer among the direct subclasses of core:UcoThing; but, the divide is still present from the axiom core:UcoInherentCharacterizationThing owl:disjointWith core:UcoObject . This appears to the proposers to be an appropriate adjustment of UCO's foundations, because UCO's foundations include design tenets of RDF. The alignment of core:UcoInherentCharacterizationThing assumes so far and decides furthermore that it has no subclasses that will ever be downloadable. Were they downloadable, it seems they would be domain objects (further, ObservableObjects) under UcoObject. To date, it seems the only inherent characterization thing subclass that comes close to fuzzing the downloadable-or-not divide by bundling URLs is observable:URLHistoryEntry, but that class uses observable:url to and observable:referrerURL to separate observable:URLs.

Visual summary

This figure illustrates the added classes and alignments. Current disjointedness axioms are also illustrated.

flowchart BT

subgraph AlwaysIR
  observable_WebPage[observable:WebPage]
  core_InformationResource[core:InformationResource]
end
subgraph MaybeIR
  core_IdentityAbstraction[core:IdentityAbstraction]
  core_Item[core:Item]
  core_NonInformationResource[core:NonInformationResource]
  core_UcoObject[core:UcoObject]
  core_UcoThing[core:UcoThing]
  identity_Identity[identity:Identity]
  observable_Observable[observable:Observable]
  observable_ObservableObject[observable:ObservableObject]
  observable_WebResource[observable:WebResource]
end
subgraph NeverIR
  action_Action[action:Action]
  core_Event[core:Event]
  core_NeverInformationResource[core:NeverInformationResource]
  core_UcoInherentCharacterizationThing[core:UcoInherentCharacterizationThing]
  identity_Organization[identity:Organization]
  identity_Person[identity:Person]
  observable_Device[observable:Device]
  observable_URL[observable:URL]
end

style core_InformationResource stroke:#0f0
style core_NonInformationResource stroke:#0f0
style core_NeverInformationResource stroke:#0f0
style observable_WebResource stroke:#0f0

core_InformationResource -- ⊂ --> core_UcoThing
core_NonInformationResource -- ⊂ --> core_UcoThing
core_NeverInformationResource -- ⊂ --> core_NonInformationResource
core_InformationResource x-- ⋂=∅ --x core_NeverInformationResource
linkStyle 3 color:red,stroke:red;

action_Action x-- ⋂=∅ --x core_Event
linkStyle 4 color:red,stroke:red;
action_Action -- ⊂ --> core_UcoObject
action_Action -- ⊂ --> core_NeverInformationResource
core_Event -- ⊂ --> core_NeverInformationResource
core_Event -- ⊂ --> core_UcoObject
core_IdentityAbstraction -- ⊂ --> core_UcoObject
core_Item -- ⊂ --> core_UcoObject
core_UcoInherentCharacterizationThing -- ⊂ --> core_NeverInformationResource
core_UcoInherentCharacterizationThing x-- ⋂=∅ --x core_UcoObject
linkStyle 12 color:red,stroke:red;
core_UcoObject -- ⊂ --> core_UcoThing
identity_Identity -- ⊂ --> core_IdentityAbstraction
identity_Organization -- ⊂ --> core_NeverInformationResource
identity_Organization -- ⊂ --> identity_Identity
identity_Person -- ⊂ --> core_NeverInformationResource
identity_Person -- ⊂ --> identity_Identity
observable_Device -- ⊂ --> core_NeverInformationResource
observable_Device -- ⊂ --> observable_ObservableObject
observable_Observable -- ⊂ --> core_UcoObject
observable_ObservableObject -- ⊂ --> core_Item
observable_ObservableObject -- ⊂ --> observable_Observable
observable_URL -- ⊂ --> core_NeverInformationResource
observable_URL -- ⊂ --> observable_ObservableObject
observable_WebResource -- ⊂ --> observable_ObservableObject
observable_WebPage -- ⊂ --> core_InformationResource
observable_WebPage -- ⊂ --> observable_WebResource

Coordination