spdx / spdx-3-model

The model for the information captured in SPDX version 3 standard.
https://spdx.dev/use/specifications/
Other
69 stars 44 forks source link

Can JSON-LD be aligned with JSON, or does it introduce to much overhead and complexity? #138

Closed maxhbr closed 6 months ago

maxhbr commented 1 year ago

Lets look at the license expression "GPL-3.0.or-later AND (licenseRef-custom-license-1 OR MIT)" and to have some context let it be the concludedLicense of some package.

JSON-LD:

Based on the current licensing model https://github.com/spdx/spdx-3-model/commit/cc8c51f500b6fb275f8cb26df1e83f530cc64807, this would serialize as JSON-LD to something like:

{
    "@type": "Package",
    "@id": "myprefix/my-package",
    "packageName": "my-Package",
    "concludedLicense": {
        "@type": "ConjunctiveLicenseSet",
        "child": [
            {
                "@type": "DisjunctiveLicenseSet",
                "child": [
                    "myprefix/4a10eaa0-ca51-11ed-afa1-0242ac120002",
                    "spdx.org/licenses/MIT"
                 ]
            },
            "spdx.org/licenses/GPL-3.0-or-later"
        ]
    }
},
{
    "@type": "CustomLicense",
    "@id": "myprefix/4a10eaa0-ca51-11ed-afa1-0242ac120002",
    "licenseId": "licenseRef-custom-license-1",
    "name": "Some super custom License",
    "licenseText": "My Software is free to us"
}

(This already replaces { "@id": $ID } with just $ID. Some of the "@id" fields might not be sufficient and need a prefix.)

implications for other serializations:

Since we are trying to base the JSON serialization on JSON-LD, this would also mean, that basically every serialization would have this tree like structure instead of the simple string representation of the expression.

Question:

So, the arising question is: is the above serialization acceptable for someone coming from the JSON side?

Or: does anyone have a solution to render something readable as valid JSON-LD?

maxhbr commented 1 year ago

ping @zvr and @seabass-labrax , as you were in the discussion where this came up.

swinslow commented 1 year ago

Just to ask (and I know I haven't been involved in the serialization / canonicalization discussions):

Is it preferable, or even feasible, for the license fields to be serialized in JSON format in standard expression form, e.g. "licenseConcluded": "GPL-3.0.or-later AND (LicenseRef-custom-license-1 OR MIT)"? That's the way it's done today in 2.3 for JSON as well as tag-value.

I'm assuming (without really knowing how the tech team folks are feeling about it) that at least for tag-value for 3.0, this will still be preserved. So it will still be necessary for tooling that handles the different document formats to do interchange between license expressions strings and the model structure. Given that, would it make more sense to have JSON (and other non-RDF formats) to use the string-based formatting for the serialization, even if the data structure in the backend is the more complex model format?

davaya commented 1 year ago

Aside from the question of license expressions, the more general question is how JSON-LD files are handled by offline applications. JSON files are standalone and valid without any context information.

1) Can all JSON-LD files be processed without context files? 2) Can some JSON-LD files be processed without context files (i.e. the required context contains only maps from short property names to graph URIs, and the graph URIs aren't essential to understand the meaning of the SBOM)? 3) if 2, can the SPDX model be constrained to use only non-required context?

This is analogous to linked data on the web - it is possible to read an academic paper with references offline without being able to retrieve any of its links, and if the paper includes images, PDF format includes image data but not the content of all links. If this isn't possible with JSON-LD, I'd say a standalone JSON format is still required.

goneall commented 1 year ago

Very interesting discussion on this issue. Based on feedback for the SPDX 2.X serializations, I believe we will need to support string license expressions in at least one of the 3.0 supported formats. In SPDX 2.3, license expressions strings are used in Tag/Value, JSON, XML, and YAML.

All RDF formats use the full license model (JSON-LD, RDF/XML, RDF Turtle, RDFa). Supporting the full license model is extremely useful when doing license based reasoning (e.g. if package A has a dynamic relationship to package B which has GPL-2.0-or-later in a conjunctive license set for the concluded license, flag an issue).

@maxhbr brings up a very good point that using license expression strings breaks the external context file solution for supporting JSON LD with normal JSON.

maxhbr commented 1 year ago

@davaya brings up a very good point that using license expression strings breaks the external context file solution for supporting JSON LD with normal JSON.

yes, that is what started the discussion that created this issue.

goneall commented 1 year ago

Can all JSON-LD files be processed without context files?

It would depend on the design of the JSON schema. I personally think it would be a good goal to have. Of course, you would loose the links needed for semantic reasoning, but the syntax would still be valid.

goneall commented 1 year ago

In terms of the stated problem on license expressions, I can think of a few approaches:

Maybe there are more solutions?

meretp commented 1 year ago

In preparation for the discussion on Tuesday I created a gist with an example for json-ld serialization: https://gist.github.com/meretp/561e4ea963122f5811a147ea056e4d84 The file is generated from the SPDXJSONExample-v2.3 in the spec, which is converted to SPDX3 according to the current migration guide. The process and known limitations are described in the markdown file in the gist.

ping @zvr

maxhbr commented 1 year ago

As Licenses are not expected to be derived from Element, the above example does not work. The CustomLicense is not allowed to live on the same level as Package. I do not know where one would put it.

maxhbr commented 1 year ago

the PR https://github.com/spdx/spdx-3-model/pull/376 contains promising examples which look like it is possible and might result in acceptable JSON files.

maxhbr commented 1 year ago

also related: https://github.com/spdx/spdx-3-model/issues/392

goneall commented 6 months ago

With PR #376 I'm suggesting we're close enough - especially for 3.0. Since we have other issues with proposals for simple JSON, I'm going to close this issue as resolved.