oasis-tcs / openc2-jadn

OASIS OpenC2 TC: Specifing a vocabulary to describe the meaning of structured data, to provide hints for user interfaces working with structured data, and to make assertions about what a valid instance must look like. https://github.com/oasis-tcs/openc2-jadn
Other
5 stars 2 forks source link

Public Review Comment: Is JADN needed? #62

Open davaya opened 3 years ago

davaya commented 3 years ago

Source: https://lists.oasis-open.org/archives/openc2-comment/202106/msg00002.html

Andrii Berezovskyi: "Have you considered the following specifications from W3C: RDF, RDFS, JSON-LD, SHACL? RDF, RDFS (and potentially OWL and BFO) should take care of your information modelling needs, JSON-LD provides a JSON serialisations, SHACL provides extensive validation capabilities. I would be interested to see the analysis why these technologies were eliminated after your consideration."

davaya commented 3 years ago

An ontology represents knowledge, RDF is one of several syntaxes used to serialize ontology content. Information models define the "format-independent syntax" of any type of content. XSD is closest in purpose to JADN but applies only to XML data. SHACL expresses UML contraints such as cardinality ranges but does not address syntax. JADN defines a small set of UML-based datatypes to which serialization rules for many data formats can be applied.

In short, it does not appear that any combination of W3C specifications fully support definition of information models, but JADN's datatype structure facilitates linkage between information graphs and OWL knowledge graphs.

Further rationale is provided in https://docs.google.com/document/d/1jWfpyP7Ws8htun3qiCnSgAZJkbFyRPXdg3eglloVyNQ and https://docs.google.com/document/d/1gY8ZaQJmJTpx8468Conchc2XVzTKE8x0WFSQT1qtB8o.

davaya commented 1 year ago

RDF is an ontology language while JADN is an information modeling language. But RDF provides a solid foundation for defining their intersection:

Information models define datatypes. RDF datatype and its Lexical-to-Value Mapping correspond exactly to information types and their serializations. But RDF lexical values are currently constrained to be character strings, so it cannot support non-textual serialization formats. With the addition of support for binary lexical values and L2V mappings, RDF Datatypes could support information modeling.

But in practice, RDF is rarely if ever used to model datatypes. The only examples currently provided are based on XSD simple strings, which invert the roles of "logical" and "lexical". For example, XSD defines "decimal" as a fundamental logical type rather than defining a real number logical type with a decimal string lexical representation, as is universally done with programming language variables and formatted I/O. And although RDF has placeholders for XML, HTML, and JSON literals, there is no definition or discussion of how to coherently map lexical values in those formats to logical datatypes. Information modeling can supply that missing capability, using Collection datatypes as the logical core and serializing them into multiple text and binary lexical formats.

dlemire60 commented 5 months ago

This issue was addressed in the JADN-IM-CN FAQ.

Recommend closing.

dlemire60 commented 5 months ago

Statement in the JADN-IM-CD, Section D.2:

Information defines loss. Lossless transformations across data formats preserve information; after a round trip significant data is unchanged and insignificant data can be ignored. A lossy round trip is lossy not because it alters data, but because it alters significant data.

Should that be restated along the lines of "information defines the threshold of loss"? Since loss here means "loss of information" (i.e., significant data) it seems that by defining what is significant, the IM defines the point beyond which a round trip becomes lossy.