Closed maxhbr closed 6 months ago
ping @zvr and @seabass-labrax , as you were in the discussion where this came up.
Just to ask (and I know I haven't been involved in the serialization / canonicalization discussions):
Is it preferable, or even feasible, for the license fields to be serialized in JSON format in standard expression form, e.g. "licenseConcluded": "GPL-3.0.or-later AND (LicenseRef-custom-license-1 OR MIT)"
? That's the way it's done today in 2.3 for JSON as well as tag-value.
I'm assuming (without really knowing how the tech team folks are feeling about it) that at least for tag-value for 3.0, this will still be preserved. So it will still be necessary for tooling that handles the different document formats to do interchange between license expressions strings and the model structure. Given that, would it make more sense to have JSON (and other non-RDF formats) to use the string-based formatting for the serialization, even if the data structure in the backend is the more complex model format?
Aside from the question of license expressions, the more general question is how JSON-LD files are handled by offline applications. JSON files are standalone and valid without any context information.
1) Can all JSON-LD files be processed without context files? 2) Can some JSON-LD files be processed without context files (i.e. the required context contains only maps from short property names to graph URIs, and the graph URIs aren't essential to understand the meaning of the SBOM)? 3) if 2, can the SPDX model be constrained to use only non-required context?
This is analogous to linked data on the web - it is possible to read an academic paper with references offline without being able to retrieve any of its links, and if the paper includes images, PDF format includes image data but not the content of all links. If this isn't possible with JSON-LD, I'd say a standalone JSON format is still required.
Very interesting discussion on this issue. Based on feedback for the SPDX 2.X serializations, I believe we will need to support string license expressions in at least one of the 3.0 supported formats. In SPDX 2.3, license expressions strings are used in Tag/Value, JSON, XML, and YAML.
All RDF formats use the full license model (JSON-LD, RDF/XML, RDF Turtle, RDFa). Supporting the full license model is extremely useful when doing license based reasoning (e.g. if package A has a dynamic relationship to package B which has GPL-2.0-or-later in a conjunctive license set for the concluded license, flag an issue).
@maxhbr brings up a very good point that using license expression strings breaks the external context file solution for supporting JSON LD with normal JSON.
@davaya brings up a very good point that using license expression strings breaks the external context file solution for supporting JSON LD with normal JSON.
yes, that is what started the discussion that created this issue.
Can all JSON-LD files be processed without context files?
It would depend on the design of the JSON schema. I personally think it would be a good goal to have. Of course, you would loose the links needed for semantic reasoning, but the syntax would still be valid.
In terms of the stated problem on license expressions, I can think of a few approaches:
AnyLicenseInfo
class in the 2.X model) and an xsd:string
which would contain a valid license expression.AnyLicenseInfo
which is a string representing a license expression.Maybe there are more solutions?
In preparation for the discussion on Tuesday I created a gist with an example for json-ld serialization: https://gist.github.com/meretp/561e4ea963122f5811a147ea056e4d84 The file is generated from the SPDXJSONExample-v2.3 in the spec, which is converted to SPDX3 according to the current migration guide. The process and known limitations are described in the markdown file in the gist.
ping @zvr
As Licenses are not expected to be derived from Element, the above example does not work. The CustomLicense
is not allowed to live on the same level as Package
. I do not know where one would put it.
the PR https://github.com/spdx/spdx-3-model/pull/376 contains promising examples which look like it is possible and might result in acceptable JSON files.
also related: https://github.com/spdx/spdx-3-model/issues/392
With PR #376 I'm suggesting we're close enough - especially for 3.0. Since we have other issues with proposals for simple JSON, I'm going to close this issue as resolved.
Lets look at the license expression
"GPL-3.0.or-later AND (licenseRef-custom-license-1 OR MIT)"
and to have some context let it be the concludedLicense of some package.JSON-LD:
Based on the current licensing model https://github.com/spdx/spdx-3-model/commit/cc8c51f500b6fb275f8cb26df1e83f530cc64807, this would serialize as JSON-LD to something like:
(This already replaces
{ "@id": $ID }
with just$ID
. Some of the"@id"
fields might not be sufficient and need a prefix.)implications for other serializations:
Since we are trying to base the JSON serialization on JSON-LD, this would also mean, that basically every serialization would have this tree like structure instead of the simple string representation of the expression.
Question:
So, the arising question is: is the above serialization acceptable for someone coming from the JSON side?
Or: does anyone have a solution to render something readable as valid JSON-LD?