ucoProject / UCO

This repository is for development of the Unified Cyber Ontology.
Apache License 2.0
80 stars 34 forks source link

Add Analysis action and AnalyticResult #400

Closed cyberinvestigationexpress closed 1 year ago

cyberinvestigationexpress commented 2 years ago

Background

Actions of performing analysis are very common across the targeted domain scope of UCO.

This could include analysis actions as diverse as file extraction analysis as part of digital forensics, human-based malware behavior analysis, ML-based multimedia classifiers, etc.

There is a need to express such analysis across domains, including digital forensics, cyber-investigation, cyber threat intelligence domain, risk, and security operations.

There is value in being able to clearly distinguish analysis actions from other forms of action, to constrain certain forms of relationships as to or from analysis actions specifically, and the ability to adorn such actions with analysis-specific properties.

There is further value in being able to pair the expression of such analysis actions with specific forms of expression of analysis results.

UCO does not currently provide capability to uniquely specify analysis-specific actions in a standard yet flexible form.

UCO would benefit from such capability to support a wide range of analysis actions and their results in a consistent yet flexible standardized form.

At its simplest level this could be illustrated by the following simple scenario:

This general approach could then be tailored for specific forms of analytic result.

UCO would benefit immediately from the capability to specify analysis actions and analytic results for the specific case of ML classification such as multimedia classification.

A simple scenario for this would be something like:

The ability to express such ML classification analysis actions and analytic results supports other varieties of such analysis such as computed similarity classification analysis.

From the Casey and Bollé chapter titled "Formalising Representation and Interpretation of Digital Evidence to Reinforce Reasoning and Automated Analysis":

The simplest example is a computed similarity or ML classification. Figure 4 depicts such a scenario, with a similarity comparison meeting a preset threshold. Performing an AnalysisAction using an AnalyticTool to process input data and produce an AnalyticResult such as a similarity score that meets the threshold, establishing a Relationship between the inputs.

Title__Formalising_Representation_and_Interpretation_of_Digital_Evidence_to_Reinforce_Computer-assisted_Analysis_-_Google_Docs

Figure 4: Simple example of an AnalysisAction operating on input data using an AnalyticTool to produce an AnalyticResult (e.g., similarity score or classification confidence). When a threshold is set and met by the AnalyticResult, a Relationship can be drawn between the inputs (e.g., similar emails).

Requirements

Requirement 1

Represent an analysis action of some performed analysis to produce an analytic result.

Requirement 2

Represent the analysis action of running an analytic tool on data to produce an analytic result.

Requirement 3

Represent the general analytic result of a general analysis action

Requirement 4

Represent the specific analytic result of a specific type of analysis action

Requirement 5

Represent the specific analytic result (including type of classification and confidence in classification) of an automated multimedia classifier action

Requirement 6

Represent the specific analytic result (including types of classification and confidence in classifications) of an automated multimedia classifier action performing multiple classifications in a single pass

Risk / Benefit analysis

Benefits

This proposal provides a consistent and flexible capability for expressing different forms of analysis and their accompanying results. It would provide the ability to clearly distinguish analysis actions from other forms of action. It would provide the ability to constrain certain forms of relationships as to or from analysis actions specifically. It would provide the ability to adorn analysis actions and their accompanying analytic results with analysis-specific properties.

It would provide the specific capability to express ML classifier actions such as multimedia classification analysis and their specific analytic results in a standard form.

Risks

Might be duplicative of existing ML ontology work such as A MACHINE LEARNING ONTOLOGY (https://osf.io/preprints/frenxiv/rc954/download)

Competencies demonstrated

Competency 1

A user is looking to understand the details of some particular analysis that was performed.

Competency Question 1.1

What specific type of analysis was performed?

Result 1.1

Review core:name of the Analysis

Competency Question 1.2

Who performed the analysis?

Result 1.2

Review action:performer of the Analysis

Competency Question 1.3

What tools/instruments were used to performed the analysis?

Result 1.3

Review action:instrument of the Analysis

Competency Question 1.4

When was the analysis performed?

Result 1.4

Review action:startTime and action:endTime of the Analysis

Competency Question 1.5

Where was the analysis performed?

Result 1.5

Review action:location of the Analysis

Competency Question 1.6

In what environment was the analysis performed?

Result 1.6

Review action:environment of the Analysis

Competency Question 1.7

Was the analysis automated or manual?

Result 1.7

Review action:isAutomated of the Analysis

Competency Question 1.8

What was the input for the analysis?

Result 1.8

Review action:object of the Analysis

Competency Question 1.9

What was the result of the analysis?

Result 1.9

Review core:statement and any facets of the AnalyticResult

Competency Question 1.10

From which analysis did a particular AnalyticResult come from?

Result 1.10

Review aanalysis:originatingAnalysis of the AnalyticResult

Competency 2

A user is looking to understand the details of some ML classification analysis of digital photo content that was performed.

Competency Question 2.1

What is the result of ML classifier on photo?

Result 2.1

Classified as containing money with 0.9 confidence

Solution suggestion

Examples

First general scenario from Background above (in this case, manual software malware analysis within the Cyber Threat Intelligence application domain):

{
  "@graph": [
    {
       "@id": "kb:organization-fb6c05a0-b6be-4a10-ba62-0e7b1da4c0ec",
       "@type": "uco-identity:Organization",
       "uco-core:name": "hex-rays"
    },
   {
      "@id": "kb:AnalyticTool-0b635b9f-bdb8-4492-9b4e-dec6797b82db",
      "@type": "uco-tool:AnalyticTool",
      "uco-core:name": "IDA Pro",
      "uco-tool:toolType": "binary code analysis tool",
      "uco-tool:creator": {
          "@id": "kb:organization-fb6c05a0-b6be-4a10-ba62-0e7b1da4c0ec",
      },
      "uco-tool:version": "7.7"
    },
    {
      "@id": "kb:Analysis-f365add7-1326-426f-9266-406bdeed86a1",
      "@type": "uco-analysis:Analysis",
      "uco-core:name": "Reverse engineer software to determine malicious intent",
      "uco-core:startTime": {
        "@type": "xsd:dateTime",
        "@value": "2022-05-12T10:21:00.00Z"
      },
      "uco-core:endTime": {
        "@type": "xsd:dateTime",
        "@value": "2022-05-14T15:58:00.00Z"
      },
      "uco-action:isAutomated": false,
      "uco-action:location": {
        "@id": "kb:9b82c2bc-10f7-47b2-81a8-443a9f458440"
      },
      "uco-action:performer": {
        "@id": "kb:Analyst-c1d5f9cc-10cd-4fdb-9570-e9d00e6df6f7"
      },
      "uco-action:instrument": {
        "@id": "kb:AnalyticTool-0b635b9f-bdb8-4492-9b4e-dec6797b82db"
      },
      "uco-action:environment": {
        "@id": "kb:Computer-e640f827-1f5b-4e8a-bd89-7afdf2c85caa"
      },
      "uco-action:object": [
        {
          "@id": "kb:Software1-2ef1d3c7-eb2d-470d-89ea-291daed6549b"
        }
      ],
      "uco-action:result": [
        {
          "@id": "kb:ProvenanceRecord-aa90afe6-9069-49bb-8ad8-b05d3f4f143b"
        },
        {
          "@id": "kb:AnalyticResult-67fb2d95-dc94-4833-a270-582c37feb879"
        }
      ]
    },
    {
      "@id": "kb:AnalyticResult-67fb2d95-dc94-4833-a270-582c37feb879",
      "@type": "analysis:AnalyticResult",
      "uco-analysis:originatingAnalysis": "kb:AnalyticTool-0b635b9f-bdb8-4492-9b4e-dec6797b82db",
      "uco-core:statement": "Software exhibits malicious intent"
    }
  ]
}

Second multimedia classification-specific scenario from Background above"

{
  "@graph": [
    {
       "@id": "kb:organization-2b3b98e2-aea2-4270-876a-7f9917623cb7",
       "@type": "uco-identity:Organization",
       "uco-core:name": "NFI"
    },
    {
      "@id": "kb:AnalyticTool-DAE5EE58-E5ED-4588-93BE-CDEC6FAA9C6A",
      "@type": "uco-tool:AnalyticTool",
      "uco-core:name": "Hansken",
      "uco-tool:toolType": "DFaaS",
      "uco-tool:creator": {
          "@id": "kb:organization-2b3b98e2-aea2-4270-876a-7f9917623cb7",
      },
      "uco-tool:version": "1.0",
      "uco-core:hasFacet": [
        {
          "@type": "uco-tool:ToolConfigurationTypeFacet",
          "uco-tool:configurationSettings": [
            {
              "@type": "uco-tool:ConfigurationSettingType",
              "uco-tool:itemName": "classifier",
              "uco-tool:itemValue": "nfi-forensic"
            },
            {
              "@type": "uco-tool:ConfigurationSettingType",
              "uco-tool:itemName": "TrainingSet",
              "uco-tool:itemValue": "0.0.7"
            }
          ]
        }
      ]
    },
    {
      "@id": "kb:Analysis-7cd51fa7-63ee-4f40-a482-9ce8333c7556",
      "@type": "uco-analysis:Analysis",
      "uco-core:name": "compute string similarity",
      "uco-core:startTime": {
        "@type": "xsd:dateTime",
        "@value": "2022-05-10T08:49:00.00Z"
      },
      "uco-core:endTime": {
        "@type": "xsd:dateTime",
        "@value": "2022-05-10T09:54:00.00Z"
      },
      "uco-action:isAutomated": true,
      "uco-action:location": {
        "@id": "kb:ESC-6FAC81EF-0966-4F05-94BB-2A5D572513CA"
      },
      "uco-action:performer": {
        "@id": "kb:Analyst-13A167EE-D3B5-4AA4-B8BA-83C25F8B8FF4"
      },
      "uco-action:instrument": {
        "@id": "kb:AnalyticTool-DAE5EE58-E5ED-4588-93BE-CDEC6FAA9C6A"
      },
      "uco-action:environment": {
        "@id": "kb:Computer-533FA61A-BE79-469E-A05F-1A341848B925"
      },
      "uco-action:object": [
        {
          "@id": "kb:RasterPicture1-b67308c0-c31b-41a6-805a-10ec526ec8bc"
        }
      ],
      "uco-action:result": [
        {
          "@id": "kb:ProvenanceRecord-d628b0f6-686d-4d22-a577-ec737e5947bc"
        },
        {
          "@id": "kb:AnalyticResult-3205CB19-0820-4009-B70B-646DBD19598B"
        }
      ]
    },
    {
      "@id": "kb:AnalyticResult-3205CB19-0820-4009-B70B-646DBD19598B",
      "@type": "analysis:AnalyticResult",
      "uco-analysis:originatingAnalysis": "kb:Analysis-7cd51fa7-63ee-4f40-a482-9ce8333c7556",
      "uco-core:hasFacet": [
        {
          "@type": "uco-analysis:ArtifactClassificationResultFacet",
          "uco-analysis:classification": [
            {
              "@type": "uco-analysis:ArtifactClassification",
              "uco-analysis:class": "money",
              "uco-analysis:classificationConfidence": 0.997359037
            }
          ]
        }
      ]
    }
  ]
}

(Edit: Github presentation syntax and needed prefixes added by OC Chair.)

Coordination

ajnelson-nist commented 2 years ago

@cyberinvestigationexpress , there are a few questions the illustration JSON opens.

  1. Is this proposing a new namespace with prefix analysis:?
  2. Are the drafting:-prefixed concepts in-scope or out-of-scope of this proposal? For instance, the last I saw originatingAction, it was part of a proposal of mine that ended up being dropped on a new characteristic of Facets becoming understood.
  3. How should core:confidence play into this? I think the only place there is a demonstration of confidence is on Oresteia, and it's flagged as being known to be non-conformant.
  4. How does drafting:class differ from core:tag?

I also have questions from your solution suggestion.

  1. Your first suggestion is to add AnalysisAction to the Action namespace. Why not the Analysis namespace?
  2. Would there be a ClassificationResult class accompanying the ClassificationResultFacet class?

Last, I have a question on the potential scope of this class. With the way this is illustrated presently, this proposal can help us associate confidence with a result that is structured closely to a tag. The CASE Inference Proposal (which one of us needs to transcribe to Github from Confluence) associates a confidence with a result structure that is a significantly more complex stucture, able to be consumed as a graph. Can you please include an illustration of how this money-with-0.99-confidence AnalyticResult could be used? Is it possible to illustrate something with an AnalyticResult that doesn't look like a tag, or is this scoped to tag-like conclusions only?

I will schedule this for the OC call in a few Thursdays, but if there are no answers to the above questions, I will only give this 10 minutes of committee time. I estimate this having a low chance of passing a Requirements Review vote before we see some further explanation.

sbarnum commented 2 years ago

A few suggestions for refining the proposal based on conversations that @cyberinvestigationexpress and I have had on this topic:

This would enable the immediate need to support expressing classification analysis results but do so in a way that sets us up to support a wide range of use cases going forward. The basic Analysis and AnalyticResult setup here is needed for many cyber application subdomains.

plbt5 commented 2 years ago

@cyberinvestigationexpress , @sbarnum:

Thank you for creating this Change Proposal. Unfortunately, I don't understand it.

My comment on this CP is focused on the Why of it. Without a proper description on its background and the Problem that currently exists, leaves us with insufficient criteria for not only developing the proper artefacts, but more importantly, we don't know how to value the result of the work in terms of the extent with which it achieves the user/business demands. There is a major risk in specifying requirements in terms of the perceived solution, because it overlooks the business value that one wants to achieve as well as other potential solutions that might be (more) appropriate in the context of use.

So I'd like to urge you to take the time to do the tedious work of fleshing out the Problem that is required to be solved, and the requirements that are to be met. In terms on an ontology, that translates to:

My criterion for a properly formulated CP is whether I myself, as an outsider to the domain, can explain the gist of the CP to an insider. Please support me in achieving that goal, because it results for the community in a much clearer CP with much more down-to-earth discussions, reaching conclusions and agreements much faster and providing significant clarity to the developing team about the scope, depth and demand of the ontological artefacts.

ajnelson-nist commented 2 years ago

The included example JSON-LD snippets appear to show some SHACL validation errors, and/or confusions of CASE and UCO concepts. Specifically, a non-CASE action is generating a CASE ProvenanceRecord. This isn't necessarily wrong (CASE's and UCO's SHACL doesn't disallow this).

If the uco-analysis:Analysis object is also classified as a CASE InvestigativeAction, this point is moot.

EDIT: The validation error is here: "uco-tool:creator": "NFI". creator is now an object property. http://www.wikidata.org/entity/Q2464882, with an extra annotation that wd:Q2464882 is a uco-identity:Organization could be a sufficient correction to the one validation issue I could catch by eyeball.

ajnelson-nist commented 2 years ago

@cyberinvestigationexpress , @sbarnum - is there a reason the cited ontology in "A MACHINE LEARNING ONTOLOGY" was not proposed for adoption? I am concerned with effort required in significant reinvention, but would appreciate understanding any risks with adopting another's solution.

ajnelson-nist commented 2 years ago

Remark in the meeting on "A MACHINE LEARNING ONTOLOGY" is that its scope is more focused than the general analysis action's needs.

ajnelson-nist commented 2 years ago

I've added this to the post-1.0.0 milestone, due to perceived backwards-compatibility with 1.0.0. Though, if somebody is interested in implementing and testing this proposal by end of day Thursday, we can hold a Solutions Approval vote at next Thursday's meeting.

ajnelson-nist commented 1 year ago

From the English definition of analysis:ArtifactClassification, analysis:class's value is meant to be a taxonomy member. Yet, analysis:class is implemented as an any-xsd:string-permitted property.

This is another instance of UCO avoiding staying just outside of using taxonomies, such as would be encoded by SKOS narrower-broader concept sets, or OWL class hierarchies.

I appreciate that it may be impractical to require a machine learning algorithm to provide a structured hierarchy of all of its potential classification results encoded in an RDF syntax. Strings are (by my understanding) a fairly guaranteed common-ground.

But, using strings for analysis:class imposes on the user that they provide their own mapping into a hierarchy, as an after-analysis step.

We should be aware that accepting this proposal with analysis:class introduces another usage of strings to represent taxons, and without such a mapping a user can't do things like determine whether there is any relationship between classification results (such as "Sedan" being a specialization of "Car").

I think this is something we would need to revise in the future, if not now.