Open goneall opened 3 weeks ago
For 1. and 2. above, suggest creating an SpdxDocument in memory "on the fly" with all of the Element(s) represented as root elements.
For 3., should we assume the single SpdxDocument represents the serialization information? Is there any validation we could do to confirm this? If we assume it represents the serialization information, then we can augment the serialized SpdxDocument with the information from the file itself to complete the in-memory representation.
Scenario 4. is the most challenging. It's quite likely one of the SpdxDocuments represents the serialization itself - but which one? We would need some way of determining which one is the SpdxDocument - or we treat it the same as not having any SpdxDocument.
I can tell you how the shacl2code
bindings deal with this. First of all, since they are not SPDX specific, there is no requirement that an SpdxDocument is present. The bindings have a separate concept of a SHACLObjectSet
which is the container that represents a set of objects to be serialized/destination for deserialization. It also does some indexing book-keeping (e.g. so you can look up an object by it's ID quickly), and performs "linking" where an object property that is referencing another object by a string IRI will be replaced with a reference to the actual object with that IRI, if it exists in the SHACLObjectSet
. In this case, SpdxDocument is actually just a slightly special element handled at higher layers (e.g. the Yocto SPDX code track the SpdxDocument separately, make sure there is only one per SHACLObjectSet
etc.).
I really believe that this approach is the right way to go. Don't encumber users with the semantics of SpdxDocuments
if they don't want it. It's frustrating for users if they need to (de)serialize 1 or 2 in your examples, but can't because bindings have intertwined the concept of an SpdxDocument with "a set of things to (de)serialize". Code at a higher level can make it easier to deal with SpdxDocument, since that is the common case, but it's a "layer" on top, not the core functionality. The core bindings should avoid enforcing "policy" on users about how they do things and focus on the "mechanism" that enables them to do what they need. The "policy" is the responsibility of a higher level of abstraction that makes life easier for the common cases. If you force policy on the core bindings, you're bindings are not going to be very flexible and you can end up with a lot of weird edge cases needing to be encoded because you made choices for the users they didn't like :)
IOW, with the shacl2code python bindings (and the C++ bindings I'm working on), none of these 4 are a problem at all, since SPDXDocument is not special.
@JPEWdev - I think your approach for the lower level language bindings is fine. The libraries I'm writing have to deal with the higher level semantics, hence the need to solve the issue.
The SpdxDocument
represents metadata about the serialization itself, and in some scenarios it can be quite important. One example is verifying references to SPDX elements in external files. The information to verify is stored in the SpdxDocument
. If we don't know what SpdxDocument
contains the metadata, we can't verify the external document.
I'm starting to form the opinion that we need to fix this in the serialization schema - either add an optional property at the root level, or require that only one SpdxDocument
can be present in the @graph
such that the SpdxDocument
data is unambiguous. The former would be a non-breaking change. For the code which doesn't need the meta-data, it can just be ignored.
The serialization documentation has fairly detailed descriptions on how to serialize, but not as much a deserialization approaches and scenarios.
Specifically, it would be good to (decide and) document how the resultant model would be represented in the following scenarios:
How do we handle creating the in-memory SPDX documents in each of these scenarios?