Background

Actions of performing analysis are very common across the targeted domain scope of UCO.

This could include analysis actions as diverse as file extraction analysis as part of digital forensics, human-based malware behavior analysis, ML-based multimedia classifiers, etc.

There is a need to express such analysis across domains, including digital forensics, cyber-investigation, cyber threat intelligence domain, risk, and security operations.

There is value in being able to clearly distinguish analysis actions from other forms of action, to constrain certain forms of relationships as to or from analysis actions specifically, and the ability to adorn such actions with analysis-specific properties.

There is further value in being able to pair the expression of such analysis actions with specific forms of expression of analysis results.

UCO does not currently provide capability to uniquely specify analysis-specific actions in a standard yet flexible form.

UCO would benefit from such capability to support a wide range of analysis actions and their results in a consistent yet flexible standardized form.

At its simplest level this could be illustrated by the following simple scenario:

Analysis action A was performed by B on object(s) C using instrument D at time E and location F and produced the analytic result G
The analytic result G of analysis action A conveys a simple statement of the analysis result

This general approach could then be tailored for specific forms of analytic result.

UCO would benefit immediately from the capability to specify analysis actions and analytic results for the specific case of ML classification such as multimedia classification.

A simple scenario for this would be something like:

An analyst has a photo image that they desire to know if there is any money displayed in the image
The analyst performs a multimedia classification analysis action on the photo image using a specific automated ML classifier at time E and location F and produced the analytic result G containing the classification results of the analysis
The analytic result G utilizes specialized property extensions (using core:hasFacet) of the general analytic result to convey what sort of artifact classification was performed and the confidence level that the targeted artifact is present in the analyzed photo image

The ability to express such ML classification analysis actions and analytic results supports other varieties of such analysis such as computed similarity classification analysis.

From the Casey and Bollé chapter titled "Formalising Representation and Interpretation of Digital Evidence to Reinforce Reasoning and Automated Analysis":

The simplest example is a computed similarity or ML classification. Figure 4 depicts such a scenario, with a similarity comparison meeting a preset threshold. Performing an AnalysisAction using an AnalyticTool to process input data and produce an AnalyticResult such as a similarity score that meets the threshold, establishing a Relationship between the inputs.

Title__Formalising_Representation_and_Interpretation_of_Digital_Evidence_to_Reinforce_Computer-assisted_Analysis_-_Google_Docs

Figure 4: Simple example of an AnalysisAction operating on input data using an AnalyticTool to produce an AnalyticResult (e.g., similarity score or classification confidence). When a threshold is set and met by the AnalyticResult, a Relationship can be drawn between the inputs (e.g., similar emails).

Requirements

Requirement 1

Represent an analysis action of some performed analysis to produce an analytic result.

Requirement 2

Represent the analysis action of running an analytic tool on data to produce an analytic result.

Requirement 3

Represent the general analytic result of a general analysis action

Requirement 4

Represent the specific analytic result of a specific type of analysis action

Requirement 5

Represent the specific analytic result (including type of classification and confidence in classification) of an automated multimedia classifier action

Requirement 6

Represent the specific analytic result (including types of classification and confidence in classifications) of an automated multimedia classifier action performing multiple classifications in a single pass

Risk / Benefit analysis

Benefits

This proposal provides a consistent and flexible capability for expressing different forms of analysis and their accompanying results. It would provide the ability to clearly distinguish analysis actions from other forms of action. It would provide the ability to constrain certain forms of relationships as to or from analysis actions specifically. It would provide the ability to adorn analysis actions and their accompanying analytic results with analysis-specific properties.

It would provide the specific capability to express ML classifier actions such as multimedia classification analysis and their specific analytic results in a standard form.

Risks

Might be duplicative of existing ML ontology work such as A MACHINE LEARNING ONTOLOGY (https://osf.io/preprints/frenxiv/rc954/download)

Competencies demonstrated

Competency 1

A user is looking to understand the details of some particular analysis that was performed.

Competency Question 1.1

What specific type of analysis was performed?

Result 1.1

Review core:name of the Analysis

Competency Question 1.2

Who performed the analysis?

Result 1.2

Review action:performer of the Analysis

Competency Question 1.3

What tools/instruments were used to performed the analysis?

Result 1.3

Review action:instrument of the Analysis

Competency Question 1.4

When was the analysis performed?

Result 1.4

Review action:startTime and action:endTime of the Analysis

Competency Question 1.5

Where was the analysis performed?

Result 1.5

Review action:location of the Analysis

Competency Question 1.6

In what environment was the analysis performed?

Result 1.6

Review action:environment of the Analysis

Competency Question 1.7

Was the analysis automated or manual?

Result 1.7

Review action:isAutomated of the Analysis

Competency Question 1.8

What was the input for the analysis?

Result 1.8

Review action:object of the Analysis

Competency Question 1.9

What was the result of the analysis?

Result 1.9

Review core:statement and any facets of the AnalyticResult

Competency Question 1.10

From which analysis did a particular AnalyticResult come from?

Result 1.10

Review aanalysis:originatingAnalysis of the AnalyticResult

Competency 2

A user is looking to understand the details of some ML classification analysis of digital photo content that was performed.

Competency Question 2.1

What is the result of ML classifier on photo?

Result 2.1

Classified as containing money with 0.9 confidence

Solution suggestion

Create a new "analysis" namespace to contain analysis-specific concepts
Create a new analysis:Analysis class as a subclass of action:Action to convey analysis actions
Create a new action:isAutomated boolean datatype property that would optionally (cardinality 0..1) go on any action:Action including analysis:Analysis to convey if the action is automated or manual
Create a new analysis:AnalyticResult class as a subclass of core:Assertion
- making it a subclass of Assertion would enable it to optionally include a core:statement property making a summary statement for an AnalyticResult for cases where something simple is appropriate or to provide a simple summary even when more complex structured content is included via facets
Create a new analysis:originatingAnalysis object property (range of analysis:Analysis) that would optionally (cardinality 0..1) go on analysis:AnalyticResult to reference back to the Analysis action that it resulted from
Create a new analysis:resultContent object property (range of core:UcoObject) that would optionally (cardinality 0..*) go on analysis:AnalyticResult to contain any object content that resulted from the Analysis
Create a new analysis:AnalyticResultFacet as a subclass of core:Facet that would be abstract and serve as a basis for characterizing specific forms of analytic result
Create a new analysis:ArtifactClassificationResultFacet as a subclass of analysis:AnalyticResultFacet to convey results of classification analysis actions
Create a new analysis:classification object property (range of analysis:ArtifactClassification) that would optionally (cardinality 0..*) go on analysis:ArtifactClassificationResultFacet to convey one or more classification results
Create a new analysis:ArtifactClassification class to convey details of a single classification result
Create a new analysis:class datatype property (range of string) that would be required on analysis:ArtifactClassification and convey a specific classification type (e.g., money, face, car, water, etc)
Create a new analysis:classificationConfidence property (xsd:decimal) that would optionally (cardinality 0..1) go on analysis:ArtifactClassification to convey the confidence in the classification

Examples

First general scenario from Background above (in this case, manual software malware analysis within the Cyber Threat Intelligence application domain):

{
  "@graph": [
    {
       "@id": "kb:organization-fb6c05a0-b6be-4a10-ba62-0e7b1da4c0ec",
       "@type": "uco-identity:Organization",
       "uco-core:name": "hex-rays"
    },
   {
      "@id": "kb:AnalyticTool-0b635b9f-bdb8-4492-9b4e-dec6797b82db",
      "@type": "uco-tool:AnalyticTool",
      "uco-core:name": "IDA Pro",
      "uco-tool:toolType": "binary code analysis tool",
      "uco-tool:creator": {
          "@id": "kb:organization-fb6c05a0-b6be-4a10-ba62-0e7b1da4c0ec",
      },
      "uco-tool:version": "7.7"
    },
    {
      "@id": "kb:Analysis-f365add7-1326-426f-9266-406bdeed86a1",
      "@type": "uco-analysis:Analysis",
      "uco-core:name": "Reverse engineer software to determine malicious intent",
      "uco-core:startTime": {
        "@type": "xsd:dateTime",
        "@value": "2022-05-12T10:21:00.00Z"
      },
      "uco-core:endTime": {
        "@type": "xsd:dateTime",
        "@value": "2022-05-14T15:58:00.00Z"
      },
      "uco-action:isAutomated": false,
      "uco-action:location": {
        "@id": "kb:9b82c2bc-10f7-47b2-81a8-443a9f458440"
      },
      "uco-action:performer": {
        "@id": "kb:Analyst-c1d5f9cc-10cd-4fdb-9570-e9d00e6df6f7"
      },
      "uco-action:instrument": {
        "@id": "kb:AnalyticTool-0b635b9f-bdb8-4492-9b4e-dec6797b82db"
      },
      "uco-action:environment": {
        "@id": "kb:Computer-e640f827-1f5b-4e8a-bd89-7afdf2c85caa"
      },
      "uco-action:object": [
        {
          "@id": "kb:Software1-2ef1d3c7-eb2d-470d-89ea-291daed6549b"
        }
      ],
      "uco-action:result": [
        {
          "@id": "kb:ProvenanceRecord-aa90afe6-9069-49bb-8ad8-b05d3f4f143b"
        },
        {
          "@id": "kb:AnalyticResult-67fb2d95-dc94-4833-a270-582c37feb879"
        }
      ]
    },
    {
      "@id": "kb:AnalyticResult-67fb2d95-dc94-4833-a270-582c37feb879",
      "@type": "analysis:AnalyticResult",
      "uco-analysis:originatingAnalysis": "kb:AnalyticTool-0b635b9f-bdb8-4492-9b4e-dec6797b82db",
      "uco-core:statement": "Software exhibits malicious intent"
    }
  ]
}

Second multimedia classification-specific scenario from Background above"

{
  "@graph": [
    {
       "@id": "kb:organization-2b3b98e2-aea2-4270-876a-7f9917623cb7",
       "@type": "uco-identity:Organization",
       "uco-core:name": "NFI"
    },
    {
      "@id": "kb:AnalyticTool-DAE5EE58-E5ED-4588-93BE-CDEC6FAA9C6A",
      "@type": "uco-tool:AnalyticTool",
      "uco-core:name": "Hansken",
      "uco-tool:toolType": "DFaaS",
      "uco-tool:creator": {
          "@id": "kb:organization-2b3b98e2-aea2-4270-876a-7f9917623cb7",
      },
      "uco-tool:version": "1.0",
      "uco-core:hasFacet": [
        {
          "@type": "uco-tool:ToolConfigurationTypeFacet",
          "uco-tool:configurationSettings": [
            {
              "@type": "uco-tool:ConfigurationSettingType",
              "uco-tool:itemName": "classifier",
              "uco-tool:itemValue": "nfi-forensic"
            },
            {
              "@type": "uco-tool:ConfigurationSettingType",
              "uco-tool:itemName": "TrainingSet",
              "uco-tool:itemValue": "0.0.7"
            }
          ]
        }
      ]
    },
    {
      "@id": "kb:Analysis-7cd51fa7-63ee-4f40-a482-9ce8333c7556",
      "@type": "uco-analysis:Analysis",
      "uco-core:name": "compute string similarity",
      "uco-core:startTime": {
        "@type": "xsd:dateTime",
        "@value": "2022-05-10T08:49:00.00Z"
      },
      "uco-core:endTime": {
        "@type": "xsd:dateTime",
        "@value": "2022-05-10T09:54:00.00Z"
      },
      "uco-action:isAutomated": true,
      "uco-action:location": {
        "@id": "kb:ESC-6FAC81EF-0966-4F05-94BB-2A5D572513CA"
      },
      "uco-action:performer": {
        "@id": "kb:Analyst-13A167EE-D3B5-4AA4-B8BA-83C25F8B8FF4"
      },
      "uco-action:instrument": {
        "@id": "kb:AnalyticTool-DAE5EE58-E5ED-4588-93BE-CDEC6FAA9C6A"
      },
      "uco-action:environment": {
        "@id": "kb:Computer-533FA61A-BE79-469E-A05F-1A341848B925"
      },
      "uco-action:object": [
        {
          "@id": "kb:RasterPicture1-b67308c0-c31b-41a6-805a-10ec526ec8bc"
        }
      ],
      "uco-action:result": [
        {
          "@id": "kb:ProvenanceRecord-d628b0f6-686d-4d22-a577-ec737e5947bc"
        },
        {
          "@id": "kb:AnalyticResult-3205CB19-0820-4009-B70B-646DBD19598B"
        }
      ]
    },
    {
      "@id": "kb:AnalyticResult-3205CB19-0820-4009-B70B-646DBD19598B",
      "@type": "analysis:AnalyticResult",
      "uco-analysis:originatingAnalysis": "kb:Analysis-7cd51fa7-63ee-4f40-a482-9ce8333c7556",
      "uco-core:hasFacet": [
        {
          "@type": "uco-analysis:ArtifactClassificationResultFacet",
          "uco-analysis:classification": [
            {
              "@type": "uco-analysis:ArtifactClassification",
              "uco-analysis:class": "money",
              "uco-analysis:classificationConfidence": 0.997359037
            }
          ]
        }
      ]
    }
  ]
}

(Edit: Github presentation syntax and needed prefixes added by OC Chair.)

Coordination

Tracking in Jira ticket OC-111
[x] Administrative review completed, proposal initially announced to Ontology Committees (OCs) on 2022-06-13; revised Issue announced 2022-07-07.
[x] Requirements to be discussed in OC meeting, 2022-07-12
[x] Requirements Review vote occurred, passing, on 2022-07-12
[x] Requirements development phase completed.
[x] Solution announced to OCs on 2022-11-01
[x] Solutions Approval to be discussed in OC meeting, 2022-11-17
[x] Solutions Approval vote occurred, passing, on 2022-11-18
[x] Solutions development phase completed.
[x] Implementation merged into develop
[x] Milestone linked
[x] Documentation logged in pending release page

@cyberinvestigationexpress , there are a few questions the illustration JSON opens.

Is this proposing a new namespace with prefix analysis:?
Are the drafting:-prefixed concepts in-scope or out-of-scope of this proposal? For instance, the last I saw originatingAction, it was part of a proposal of mine that ended up being dropped on a new characteristic of Facets becoming understood.
How should core:confidence play into this? I think the only place there is a demonstration of confidence is on Oresteia, and it's flagged as being known to be non-conformant.
How does drafting:class differ from core:tag?

I also have questions from your solution suggestion.

Your first suggestion is to add AnalysisAction to the Action namespace. Why not the Analysis namespace?
Would there be a ClassificationResult class accompanying the ClassificationResultFacet class?

Last, I have a question on the potential scope of this class. With the way this is illustrated presently, this proposal can help us associate confidence with a result that is structured closely to a tag. The CASE Inference Proposal (which one of us needs to transcribe to Github from Confluence) associates a confidence with a result structure that is a significantly more complex stucture, able to be consumed as a graph. Can you please include an illustration of how this money-with-0.99-confidence AnalyticResult could be used? Is it possible to illustrate something with an AnalyticResult that doesn't look like a tag, or is this scoped to tag-like conclusions only?

I will schedule this for the OC call in a few Thursdays, but if there are no answers to the above questions, I will only give this 10 minutes of committee time. I estimate this having a low chance of passing a Requirements Review vote before we see some further explanation.

A few suggestions for refining the proposal based on conversations that @cyberinvestigationexpress and I have had on this topic:

Create a new "analysis" namespace
Create a new analysis:Analysis class as a subclass of action:Action
Create a new analysis:AnalyticResult class as a subclass of core:Assertion
- making it a subclass of Assertion would enable it to optionally include a core:statement property making a summary statement for an AnalyticResult for cases where something simple is appropriate or to provide a simple summary even when more complex structured content is included via facets
Create a new analysis:originatingAnalysis property (range of analysis:Analysis) that would optionally go on analysis:AnalyticResult to reference back to the Analysis action that it resulted from
Create a new analysis:resultContent (range of core:UcoObject) that would optionally go on analysis:AnalyticResult to contain any object content that resulted from the Analysis
Create a new analysis:AnalyticResultFacet as a subclass of core:Facet that would be abstract and serve as a basis for characterizing specific forms of analytic result
Create a new analysis:ArtifactClassificationResultFacet as a subclass of analysis:AnalyticResultFacet to convey results of classification analysis actions
Create a new analysis:classification property (range of analysis:ArtifactClassification) that would optionally go on analysis:ArtifactClassificationResultFacet (no max cardinality) to convey one or more classification results
Create a new analysis:ArtifactClassification class
Create a new analysis:class property (range of string) that would be required on analysis:ArtifactClassification and convey a specific classification type (e.g., money, face, car, water, etc)
Create a new analysis:classificationConfidence property that would optionally go on analysis:ArtifactClassification to convey the confidence in the classification

This would enable the immediate need to support expressing classification analysis results but do so in a way that sets us up to support a wide range of use cases going forward. The basic Analysis and AnalyticResult setup here is needed for many cyber application subdomains.

@cyberinvestigationexpress , @sbarnum:

Thank you for creating this Change Proposal. Unfortunately, I don't understand it.

My comment on this CP is focused on the Why of it. Without a proper description on its background and the Problem that currently exists, leaves us with insufficient criteria for not only developing the proper artefacts, but more importantly, we don't know how to value the result of the work in terms of the extent with which it achieves the user/business demands. There is a major risk in specifying requirements in terms of the perceived solution, because it overlooks the business value that one wants to achieve as well as other potential solutions that might be (more) appropriate in the context of use.

So I'd like to urge you to take the time to do the tedious work of fleshing out the Problem that is required to be solved, and the requirements that are to be met. In terms on an ontology, that translates to:

describing the problem in the background section;
- This specifies the reason for the CP to exist, i.e., business goal or value that currently is out of reach:
- (what goes wrong if we cannot classify, or, what can be achieved when we are capable of classifying analysis results)
fleshing out the (functional and possibly extra-functional) Requirements in terms of What it is that needs to be achieved (on the user & application level, not the technology level);
- This specifies the actual operational results/behaviour that the user expects the system to produce.
- (Representing something is a means to achieve a result; we are not interested in the means, but in what is being achieved: what are the things that are to be classified, on the basis of what do we classify them, what kind of classifications do we expect to emerge or do we have fixed set of them, and what are the results of doing the classification)
describing the precise situation/case serving as an example about the kind of things in reality that apply (competencies);
- An example makes the demand so much more understandable than its abstract generalisation, hence, instantiate the demand into particular things in reality that it is about; keep it simple but complete, since it will be used as the criterion for success after development.
- (under what conditions will the classification be performed, who will be doing the classification, what data will be subject to classification)
and, using those (and possibly existing competencies), the questions with their anticipated answers that the ontology should understand and produce;
- An example makes the demand so much more understandable than its abstract generalisation, hence, what particular question will the user ask in the described situation, and what results will she expect (and what not); keep it simple but complete, since it will be used as the criterion for success after development.
- (formulated in plain English, you can apply ParticularConcepts that apply, using the exemplified case description as reference to show right and wrong answers)

My criterion for a properly formulated CP is whether I myself, as an outsider to the domain, can explain the gist of the CP to an insider. Please support me in achieving that goal, because it results for the community in a much clearer CP with much more down-to-earth discussions, reaching conclusions and agreements much faster and providing significant clarity to the developing team about the scope, depth and demand of the ontological artefacts.

The included example JSON-LD snippets appear to show some SHACL validation errors, and/or confusions of CASE and UCO concepts. Specifically, a non-CASE action is generating a CASE ProvenanceRecord. This isn't necessarily wrong (CASE's and UCO's SHACL doesn't disallow this).

If the uco-analysis:Analysis object is also classified as a CASE InvestigativeAction, this point is moot.

EDIT: The validation error is here: "uco-tool:creator": "NFI". creator is now an object property. http://www.wikidata.org/entity/Q2464882, with an extra annotation that wd:Q2464882 is a uco-identity:Organization could be a sufficient correction to the one validation issue I could catch by eyeball.

@cyberinvestigationexpress , @sbarnum - is there a reason the cited ontology in "A MACHINE LEARNING ONTOLOGY" was not proposed for adoption? I am concerned with effort required in significant reinvention, but would appreciate understanding any risks with adopting another's solution.

Remark in the meeting on "A MACHINE LEARNING ONTOLOGY" is that its scope is more focused than the general analysis action's needs.

I've added this to the post-1.0.0 milestone, due to perceived backwards-compatibility with 1.0.0. Though, if somebody is interested in implementing and testing this proposal by end of day Thursday, we can hold a Solutions Approval vote at next Thursday's meeting.

From the English definition of analysis:ArtifactClassification, analysis:class's value is meant to be a taxonomy member. Yet, analysis:class is implemented as an any-xsd:string-permitted property.

This is another instance of UCO avoiding staying just outside of using taxonomies, such as would be encoded by SKOS narrower-broader concept sets, or OWL class hierarchies.

I appreciate that it may be impractical to require a machine learning algorithm to provide a structured hierarchy of all of its potential classification results encoded in an RDF syntax. Strings are (by my understanding) a fairly guaranteed common-ground.

But, using strings for analysis:class imposes on the user that they provide their own mapping into a hierarchy, as an after-analysis step.

We should be aware that accepting this proposal with analysis:class introduces another usage of strings to represent taxons, and without such a mapping a user can't do things like determine whether there is any relationship between classification results (such as "Sedan" being a specialization of "Car").

I think this is something we would need to revise in the future, if not now.

ucoProject / UCO

Add Analysis action and AnalyticResult #400