oasis-tcs / osim

OASIS OSIM TC: Working directory for OSIM TC
Other
3 stars 3 forks source link

FAQ on what information modeling languages will OSIM use #14

Open sparrell opened 4 weeks ago

sparrell commented 4 weeks ago

FAQ on what information modeling languages will OSIM use.

I propose we allow UML, ASN.1, and JADN, and potentially any other standard information modelling language TC Members propose. I propose we not "pick winners" but work in whatever TC Members want to contribute.

aj-stein-nist commented 3 weeks ago

Hi, first time caller, long-time listener. I am currently also an observer on the TC. If we are considering ASN.1 a modeling language in scope due to the related technologies in the charter, you may want to include CBOR and CDDL given their heavy usage in SCITT and other IETF technologies that are normative requirements in SCITT.

davaya commented 3 weeks ago

I whipped up a "modeling taxonomy" slide as a starting point for discussion at https://docs.google.com/presentation/d/1aVvlIHd8POlX6yVdo-X-60Y_Z0R089sAOyMrIWwTyMI/. It has four layers:

  1. Ontologies / Knowledge Graphs
  2. Information Models including Abstract Schemas
  3. Data Models / Concrete Schemas
  4. Data Values

I agree that CBOR (Concise Binary Object Representation) data format and its CDDL (Concise Data Definition Language) data model are an essential part of the use cases to be considered by this TC because unlike XML and JSON, it is a binary data format with conciseness prioritized above human readability of the data. (CBOR "annotated hex" notation looks like assembly language, with the transmitted bytes front and center and the human readable meaning of those bytes generated as annotations.) CycloneDX SBOMs can be serialized in another binary format, Google Protobuf, that is important for the same reason.

(The CBOR example in https://www.w3.org/TR/did-cbor-representation/#example-2-did-document-encoded-as-cbor-diagnostic-notation) is mislabeled: "Diagnostic Notation" looks similar to JSON, while the example shown is actually in "Annotated Hex" format).

The taxonomy shows JADN as an information model language that can in principle both generate CDDL schemas and directly validate CBOR data. Metaschema and ASN.1 also fall into the information modeling layer.

aj-stein-nist commented 3 weeks ago

I whipped up a "modeling taxonomy" slide as a starting point for discussion at https://docs.google.com/presentation/d/1aVvlIHd8POlX6yVdo-X-60Y_Z0R089sAOyMrIWwTyMI/.

In this slide, the ontology technologies and forming the ontologies/taxonomies in scope for OSIM TC? Or that is where other work begins and pick up the lower level inputs from the work that is in scope for this TC?

Thanks for the slide, that is helpful and constructive.

davaya commented 3 weeks ago

The charter says "The OASIS Open Supplychain Information Modeling (OSIM) TC aims to standardize and promote information models about all aspects of supply chains."

Information modeling is a design approach for data and systems. So standardizing the data used in information modeling is the scope of the TC, but the evaluation criterial include the ease with which IM integrates with and enhances existing data and design approaches. There's no reason to pursue IM if it doesn't make other work easier, better, or both.

aj-stein-nist commented 3 weeks ago

Information modeling is a design approach for data and systems. So standardizing the data used in information modeling is the scope of the TC, but the evaluation criterial include the ease with which IM integrates with and enhances existing data and design approaches. There's no reason to pursue IM if it doesn't make other work easier, better, or both.

Makes sense. So just so I understand: evaluating integration with existing ontologies or taxonomies is in scope, but likely not creating them, that's beyond the scope of information modeling?

Either way the TC may want to evaluate the Cyber Domain Ontology as an integration point. I have reviewed but not used or directly contributed yet. I have not see many other RDF ontologies for cyber information at its breadth and age. (Full disclosure: one of my colleagues is a maintainer.)

davaya commented 3 weeks ago

Yes. An information model defines data (documents, messages, datatypes) used as resources (subjects and objects) in an ontology. The ontology defines relationships (predicates) between subjects and objects. In RDF terms, an IM defines a lexical-to-value mapping, except that the lexical space is not restricted to strings.

Consider resources like an IP packet or an image (if gif, jpg, and png hadn't already been invented). An IM would define RGBA pixels, pixel rows, and images consisting of metadata, palettes, and rows. An ontology would define the relationships between images and other resources.

aj-stein-nist commented 3 weeks ago

Yes. An information model defines data (documents, messages, datatypes) used as resources (subjects and objects) in an ontology. The ontology defines relationships (predicates) between subjects and objects. In RDF terms, an IM defines a lexical-to-value mapping, except that the lexical space is not restricted to strings.

OK, so this comment helps in GitHub, how one or more information models, and perhaps the derived ontology, is unclear in the charter that I believe has been voted and approved. As an observer (considering upgrade to a member), it would help to understand whether or not I should dedicate some or significant effort if our goals aligned. (I only point this out because I am not asking questions just to be curious or difficult, if I consider that the spectrum. I am interested in rolling up sleeves.)

Consider resources like an IP packet or an image (if gif, jpg, and png hadn't already been invented). An IM would define RGBA pixels, pixel rows, and images consisting of metadata, palettes, and rows. An ontology would define the relationships between images and other resources.

This example is helpful, thank you. Again, if this example is relevant of a small or significant part of the TC's scope in future work, I would reiterate you should look into CDO.

And I understand these comments are going beyond relevance to just this FAQ issue. If I should discuss this questions and some others I have, let me know how I can pose those if not in GitHub issues. I know the charter isn't published here (yet?), so I want to meet the group where they are.

davaya commented 2 weeks ago

SHACL constraints define data structure whether in the context of data schemas (XSD), information models, or ontologies, so the structures defined in CDO certainly have analogs as information types. I'm speaking from the IM technology perspective, but I assume that CDO is also relevant and in-scope from the Supplychain use case perspective.

IMs are data format agnostic, so some of the peculiarities of XML are abstracted away:

But here are RDF triples for an example CDO type "Action": the fact that they define a set of properties means there is an analogous Action type in an IM. image

aj-stein-nist commented 2 weeks ago

Thanks for the additional context. So with that in mind, the TC intends to take existing IMs and potentially build a superset IM for OSIM over them?

I ask here for how it pertains to the scope and charter and how that is expressed in the FAQ. I'm not saying I think CDO should or must be used but your replies here have told me more about the scope than other documents and information in this repo than I understood prior.