spdx / canonical-serialisation

SPDX Canonicalisation repo
https://spdx.github.io/canonical-serialisation/
2 stars 1 forks source link

Fundamental data types #7

Open davaya opened 2 years ago

davaya commented 2 years ago

The information modeling approach is based on a small number of fundamental types that can: 1) support all of the defined types from the logical model, and 2) specify a canonical translation to all data syntaxes of interest

The advantage is that once the canonicalization rules for fundamental types are done, new defined types in the logical model require no new serialization work.

The types I mentioned earlier are derived from CBOR, which is a superset of JSON: Primitive Compound
Boolean Array
Binary ArrayOf
Integer Map
Number MapOf
String Record
Enumerated Choice

For example, time is fundamentally an Integer in the information model, but in our JSON serialization it can be digits with no leading zeroes and no quotes, in another it could be an RFC-3339 formatted string with quotes, and in concise serializations like CBOR, Protobuf and Avro it would be bytes.

Enumerated values always have an integer and a string. Human readable serializations would use the string, and concise ones would use the integer.

As an example, an SPDX v3 information model is https://github.com/davaya/spdx-3-elements/blob/main/Schemas/spdx-v3.jidl. The current implementation doesn't have an ABNF or PEG grammar for translating things like license expression trees and URL components to/from strings, so if we develop canonical rules for doing so we'll need a grammar to define them and code to test them.


We decided (correctly) than null is not a type, but it is a value. To canonicalize compound types like Map or Array we need to specify rules for how absent/null values are serialized, e.g., by omitting a Map property, keeping the property with an empty string, or keeping the property with a JSON null value.