Open sbarnum opened 1 year ago
To view and edit the examples in your own editor:
@sbarnum : I think there was a decision against "embedded serialization". Maybe it would simplify the discussion if you would drop examples 3 to 6 until we have reopened that topic again. (I was strongly in favor of "embedded serialization")
createdUsing
?If the parent collection has a creation info with that, and the element does not have it. It is not clear if that needs to be added to the element creation info or not. It might be removed because of compactification or it might just not be there in the first place.
I would propose simplifying the algorithm by not handling individual property differences - on deserialization, an element contained within a collection would either have no creationInfo
in which case it would use the containing collection creationInfo
or it would have a complete creationInfo
- overriding individual properties introduces complexity and I suspect it would not save too much space.
We agreed in the last tech call and the serialization meeting that using references to the same creationInfo
was sufficient for JSON-LD. Leaving this open as a potential solution for other serialization formats - but moving to 3.0 milestone.
We agreed to use JSON-LD for SPDX 3.0 - moving this to the 3.1 milestone in case we want to use it for other serializations.
There is a need/desire for maximizing the conciseness of SPDX 3+ serialization while maintaining full conformance with the specification (model) in support of targeted use cases.
One area with the potential for reducing the verbosity of SPDX serialization is in how CreationInfo details are serialized for Elements. To be fully conformant with the SPDX specification all deserialized Elements MUST at a minimum contain CreationInfo with 'profile' and 'createdBy' details and must be able to support consistent and integrous expression of the other CreationInfo details where relevant and present. This requirement leads to a situation where the natural simple serialization of the SPDX graph contains full CreationInfo details expressed on each Element even for cases where there may be a high degree of repetition.
Any approach to reduce repetition will be a tradeoff between brevity of serialization and complexity of the algorithms necessary to achieve the brevity. The one hard requirement is that any approach MUST maintain full conformance with the specification (model) in support of targeted use cases. Any approach to brevity that falls short of this bar cannot be considered a valid approach.
This issue proposes an approach to slightly modify the natural simple serialization of the SPDX graph to significantly reduce repetition of CreationInfo details while still maintaining full integrity of content and conformance with the SPDX 3+ specification.
In a nutshell, the approach is to take content defined or referenced within an SPDX ElementCollection (which commonly has a significant level of CreationInfo repetitive overlap) remove serialization of CreationInfo properties in the defined or referenced content that is repetitive with those properties in the "parent" ElementCollection itself. Upon deserialization these properties would implicitly be reapplied to the defined or referenced content. Any CreationInfo properties on defined or referenced content that differ from the same properties on the "parent" ElementCollection would remain on the defined or referenced content and would override the reapplication of those properties during deserialization.
The proposed serialization compaction rules can be described with the following outline:
The proposed deserialization compaction rules can be described with the following outline:
This approach strikes a tradeoff balance of relatively simple algorithmic logic with very significant (though not total) reduction in CreationInfo repetition while still maintaining full conformance with the specification (model) in support of targeted use cases.
Below are several serialized examples in json-ld.
Example #1 (fully expanded serialization of CreationInfo of basic SPDXDocument using natural simple flat graph serialization):
Example #2 (compacted serialization of CreationInfo of basic SPDXDocument using natural simple flat graph serialization) (36% reduction in number of serialized lines):
Example #3 (fully expanded serialization of CreationInfo of basic SPDXDocument using embedded serialization where first level of Elements defined or referenced within a CollectionElement are serialized "inside" the CollectionElement rather than as a simple flat graph (I believe this idea has been discussed but not decided so I included it here)):
Example #4 (compacted serialization of CreationInfo of basic SPDXDocument using embedded serialization ):
Example #5 (fully expanded serialization of CreationInfo of basic SPDXDocument using embedded serialization and with repetition exceptions in the "embedded" content):
Example #6 (compacted serialization of CreationInfo of basic SPDXDocument using embedded serialization and with repetition exceptions in the "embedded" content):