The information modeling approach is based on a small number of fundamental types that can:
1) support all of the defined types from the logical model, and
2) specify a canonical translation to all data syntaxes of interest
The advantage is that once the canonicalization rules for fundamental types are done, new defined types in the logical model require no new serialization work.
The types I mentioned earlier are derived from CBOR, which is a superset of JSON:
Primitive
Compound
Boolean
Array
Binary
ArrayOf
Integer
Map
Number
MapOf
String
Record
Enumerated
Choice
For example, time is fundamentally an Integer in the information model, but in our JSON serialization it can be digits with no leading zeroes and no quotes, in another it could be an RFC-3339 formatted string with quotes, and in concise serializations like CBOR, Protobuf and Avro it would be bytes.
Enumerated values always have an integer and a string. Human readable serializations would use the string, and concise ones would use the integer.
As an example, an SPDX v3 information model is https://github.com/davaya/spdx-3-elements/blob/main/Schemas/spdx-v3.jidl. The current implementation doesn't have an ABNF or PEG grammar for translating things like license expression trees and URL components to/from strings, so if we develop canonical rules for doing so we'll need a grammar to define them and code to test them.
We decided (correctly) than null is not a type, but it is a value. To canonicalize compound types like Map or Array we need to specify rules for how absent/null values are serialized, e.g., by omitting a Map property, keeping the property with an empty string, or keeping the property with a JSON null value.
The information modeling approach is based on a small number of fundamental types that can: 1) support all of the defined types from the logical model, and 2) specify a canonical translation to all data syntaxes of interest
The advantage is that once the canonicalization rules for fundamental types are done, new defined types in the logical model require no new serialization work.
For example, time is fundamentally an Integer in the information model, but in our JSON serialization it can be digits with no leading zeroes and no quotes, in another it could be an RFC-3339 formatted string with quotes, and in concise serializations like CBOR, Protobuf and Avro it would be bytes.
Enumerated values always have an integer and a string. Human readable serializations would use the string, and concise ones would use the integer.
As an example, an SPDX v3 information model is https://github.com/davaya/spdx-3-elements/blob/main/Schemas/spdx-v3.jidl. The current implementation doesn't have an ABNF or PEG grammar for translating things like license expression trees and URL components to/from strings, so if we develop canonical rules for doing so we'll need a grammar to define them and code to test them.
We decided (correctly) than null is not a type, but it is a value. To canonicalize compound types like Map or Array we need to specify rules for how absent/null values are serialized, e.g., by omitting a Map property, keeping the property with an empty string, or keeping the property with a JSON null value.