ucoProject / UCO

This repository is for development of the Unified Cyber Ontology.
Apache License 2.0
78 stars 34 forks source link

Implement ordering in observable:MessageThread #393

Closed ajnelson-nist closed 2 years ago

ajnelson-nist commented 2 years ago

Disclaimer

Participation by NIST in the creation of the documentation of mentioned software is not intended to imply a recommendation or endorsement by the National Institute of Standards and Technology, nor is it intended to imply that any specific software is necessarily the best available for the purpose.

Background

Issue 389 imports the Collections Ontology to provide UCO with a reusable representation for ordered lists.

co:List implements a strictly linear list - it can have at most one "beginning" member, and at most one "ending" member. It cannot fork.

Message threads in UCO, such as email reply threads, can link into a semi-ordered structure, and be non-linear via forking (two replies to the same message) or multiple In-Reply-to header values (possible per RFC 2822, Section 3.6.4). So, a co:List would not serve the needs of UCO's email-based message threads.

The design of co:List could be adapted into a new sibling class of co:List. This proposal does so, defining in the UCO Types namespace a class types:Thread that supports partial ordering using classes and properties from the Collections Ontology that co:List had subclassed. With this types:Thread class, observable:MessageThread is refined as a subclass of types:Thread, enabling representation of reply sequences.

Requirements

Requirement 1

UCO must have a class to represent partially ordered sets. This proposal will refer to that class as a Thread.

Requirement 2

A Thread must support one of its members being followed directly by 0, 1, or more other members.

Requirement 3

A Thread must support one of its members directly following 0, 1, or more other members.

Requirement 4

It must be possible to represent a gap in a thread's ordering, where some members are believed to follow others via an unknown but definite sequence of links.

Requirement 5

It must be possible to determine whether a member of a thread follows another member topologically, via knowledge of direct links or indirect links.

Requirement 6

observable:MessageThread must be able to represent a non-linear email thread, with 0, 1, or more In-Reply-to headers per email, and 0, 1, or more messages replying to any email.

Risk / Benefit analysis

Benefits

Risks

Competencies demonstrated

Competency 1

Suppose an email message set is found, with In-Reply-to headers linking messages like this (where a higher integer connected to a lower integer means the message with the higher integer replied to the message with the lower integer):

 1     2     3
* --- * --- *
 \     \
  \     \ 4
   \     *
 5  \ 6
* --- *

 7
*

Competency Question 1.1

How many messages were replied to how many times, on a scale of 0, 1, multiple?

Result 1.1

. . .
Messages replied to ... 0 times 3, 4, 6
1 time 5
>1 times 1,2
Messages replying to ... 0 messages 1, 5
1 message 2, 3, 4
>1 messages 6
Messages not in thread 7

These are represented in the unit test tests/examples/message_thread_PASS.json, kb:message-thread-1 and kb:message-7.

Competency Question 1.2

What messages follow 1, topologically?

Result 1.2

2, 3, 4, and 6.

This query is encoded in a Python unit test in the PR, test_message_thread().

Solution suggestion

The following table shows analogous classes and properties. Each property is cited with rdfs:seeAlso, and each class is designated owl:disjointWith, in the attached PR.

co:List types:Thread
co:ListItem types:ThreadItem
co:firstItem types:threadOriginItem
co:followedBy types:threadSuccessor
co:lastItem types:threadTerminalItem
co:nextItem types:threadNextItem
co:precededBy types:threadPredecessor
co:previousItem types:threadPreviousItem

In support of OWL inferencing, and to maintain parallel structure with the Collections Ontology, some OWL constructs are ported from the co:List class. The ported constructs are cited with rdfs:comments and/or rdfs:seeAlso.

Coordination

sbarnum commented 2 years ago

I would disagree with the proposal to make observable:MessageThread a subclass of the newly proposed types:Thread as it is likely to have all the complex semantic entailment issues as outlined in Risk 7 of CP-389 (https://github.com/ucoProject/UCO/issues/389). observable:MessageThread MUST remain a subclass of observable:ObservableObject. If it is also made a subclass of types:Thread that is likely to cause issues. It should also not be made a subclass of observable:MessageThreadFacet for similar reasons.

I would propose that a much more straightforward and low risk approach would be to not alter the subclassing of the observable classes at all and rather simply rename the observable:message property to observable:messageThread and change its range to be types:Thread instead of observable:ObservableObject. This enables the full expression of the ordered message thread, aligns to the current intended meaning of the observable:message property, keeps consistent with all current semantics for ObservableObjects and for observable object Facets, and avoids the semantic entailment complexities of hybrid subclassing between UCO-native classes and CO classes.

ajnelson-nist commented 2 years ago

... rename the observable:message property to observable:messageThread and change its range to be types:Thread instead of observable:ObservableObject. ...

@sbarnum , because I remembered UCO's usage of types:ControlledDictionary, I will follow that referential pattern, modeling on observable:exifData.

Also, I had forgotten that UCO does not have observable:messageThread as a property. I'd misremembered an old stand-in property from the messages.json example.

I'll adjust this proposal to add observable:messageThread. We can decide on the call today whether observable:message should be deleted, in light of an unordered attachment pattern now being available via the property path observable:messageThread / co:element.

ajnelson-nist commented 2 years ago

I'm further inclined to recommend observable:message be deleted because it has a SHACL constraint requiring at least 1 object be referenced.

This is another instance of what I have found to be a harmful accident from the OWL translation. An OWL minimum cardinality 1 expresses that there exists at least one message for any given message thread. I agree there. The SHACL constraint fails your data if you don't make an explicit reference to a message, which is harmful because you might want to express that a thread exists with a concrete identifier, but you might not know yet which message objects are its members.

So I will recommend to the committee we also delete observable:message.

ajnelson-nist commented 2 years ago

Solution vote today incorporates that observable:message should be removed from observable:MessageFacet, and also removed from the ontology.