ucoProject / UCO

This repository is for development of the Unified Cyber Ontology.
Apache License 2.0
80 stars 34 forks source link

Demonstration of a SKOS-backed semi-open vocabulary for MIME types #363

Open plbt5 opened 2 years ago

plbt5 commented 2 years ago

Background

In UCO, many types are identified by a string as opposed to a thing, i.e., a IRI-backed node in a graph. The advantage of the former is that a new type-by-string is easy to create when the particular type is missing from the ontology. The disadvantages are:

  1. strings are not first-class citizens in the ontology world, as opposed to IRI-backed things,
  2. consequently, subsequent code must be added to check whether the applied string equals the predefined type-by-string
  3. the approach is rather error-prone on string typo's.

The advantage of the latter, i.e., type-by-IRI, are the opposite of the former type-by-string's disadvantages. At the same time, type-by-IRI has a disadvantage of its own, being that an upgrade of the available types with a new one requires knowledge of RDF(s), OWL, and/or the data model or ontology that define the other individuals being supplemented.

The purpose of this issue is to lay the foundation that is necessary to gain data/experience with users adding a type-by-IRI in UCO.

Objective / Purpose

The purpose of this issue is to lay the foundation that is necessary to gain data/experience with issues that users might run into when adding a type-by-IRI in UCO.

Requirements

Requirement 1

UCO shall have access to a SKOS-vocabulary that specifies individuals to represent each and every mime-type as defined by the IANA Media Types registry, in order to use these individuals to specify the type of a medium registered in UCO.

Requirement 2

The resulting taxonomy shall align with the standard two-tier scheme as defined by the IANA Media Type Registry:

Requirement 3

The SKOS-vocabulary shall be serialised in Turtle.

Requirement 4

Loosely-coupled: Any modification to the SKOS-vocabulary shall not imply a change to the UCO-ontology.

Requirement 5

Manageability: Any modification to the IANA Media Type Registry shall effect an update to the SKOS-vocabulary, preferrably mechanically.

Requirement 6

Continuity & maintainability: Any modification to the SKOS-vocabulary shall result in a new version.

Requirement 7

Provenance: Any Media (Content) Type or Subtype added to the SKOS-vocabulary that originates from the UCO or CASE community, shall be categorised as such. This implies that for any Media (Content) Type / Media Subtype pair that exists, its provenance is maintained.

Risk / Benefit analysis

Benefits

The benefit of the stated objective is that data about, and experience from, users adding type-by-iri to the vocabulary become available. It is then possible to investigate how to improve the user acceptance and minimise their technical knowledge required for adding a new type in this way.

The benefit sof having a vocabulary about IANA Media Types available, are:

  1. Using these concepts (individuals) as unequivocal types in UCO;
  2. Becoming interoperable with Dublin Core users, particularly those that employ http://purl.org/dc/terms/MediaType in their graph design.

Risks

Except in relation to the semi-openess of the vocabulary, the submitter is unaware of risks associated with this change.

Consequences

The intention of theis CR is that the type-by-string design will be replaced by a type-by-IRI design. The consequences that are foreseen, are (not necessarily comprehensively) as follows:

  1. A potential impact on the design of UCO on observable:mimeType to become an owl:ObjectProperty.
  2. A potential breaking change (i.e., not backwards compatible) between the current version and the version implementing this CR.

Competencies demonstrated

Competency 1

Competency 2

Competency 3

Competency 4

As security service provider, I want to reference application/tar, and I don't care whether it is a IANA media type or not. I've always said application/tar, it's been coded like that in my product for a decade, and my customers know I mean 'tape archive' when I say that.

Competency 5

Competency 6

Solution suggestion

The taxonomy converts the IANA Media Types registry into SKOS under a UCO namespace, following a mostly two-tier skos:ConceptScheme:

Note that some extension media types not part of IANA are defined for various reasons, and may or may not be submitted in the future for standardization to IANA. These extensions follow the non-registration practice of [RFC 6838, Section 3.4], and all include the string [/x-uco-].

This repository's primary product is a monolithic ontology and taxonomy file, serialized in Turtle, mime.ttl. (This repository is undergoing NIST review for release. If you are interested in providing early feedback, please contact @ajnelson-nist .)

UCO could subclass dcterms:MediaType with a new class uco-types:IANAMediaType, and a sibling uco-types:NonIANAMediaType in order to support Requirement 7 and Competency 4.

Coordination