spdx / spdx-spec

The SPDX specification in MarkDown and HTML formats.
https://spdx.github.io/spdx-spec/
Other
283 stars 135 forks source link

Add “NOASSERTION” to the license expression syntax #50

Open wking opened 6 years ago

wking commented 6 years ago

Like #49, but for NOASSERTION instead of NONE. The semantics would be:

NOASSERTION means: (i) the SPDX License Expression author has attempted to but cannot reach a reasonable objective determination; (ii) the SPDX License Expression author has made no attempt to determine this field; or (iii) the SPDX License Expression author has intentionally provided no information (no meaning should be implied by doing so).

That matches our existing usage except for PackageLicenseInfoFromFiles and similar, where we currently drop (i). I don't think those consumers would suffer from the additional case, because I don't see an actionable distinction between those cases. When would you care about the distinction between “tried but gave up”, “did not try”, and “won't tell you”? If folks did care about those distinctions (which I think unlikely), we'd want to be using different tokens for each case.

Other divergent NOASSERTION consumers are:

jeffmcaffer commented 5 years ago

Just stumbled upon these issues (#49 and #50). In ClearlyDefined we are using SPDX identifiers like crazy (that's literally all we support) and have found there are four distinct cases:

1) There is a valid SPDX identifier. Great. Use it 1) There is some sort of license-ish thing but we can't figure out what it is or it does not have an SPDX id. == NOASSERTION 1) We looked as best we could and could not find anything license related == NONE 1) Did not look. == Leave the value blank, null, whatever

In the overall compliance workflow this corresponds to 1) Great. I can analyze the license terms and carry on 1) Sigh, I need to track down that license info, break out the lawyers and figure things out 1) Dang! Really? It's 2019 and there is no license?! Better go contact the project team and see what's up 1) Hmmm, I have basic work to do

These distinctions also help in the "curation" process to let subsequent users of the data understand the work that has (or has not) been done.

As for including these in expressions, we found the need to allow for these in expressions to support expression aggregation, simplification and comparison.

For example, we find some repo that has files with a mess of different licenses. We'd want to aggregate these by AND'ing them all together. At the same time you'd want to simplify to remove redundant expression clauses. Since any given file could have NOASSERTION or NONE license value, that should be carried through to get a possible final expression like MIT AND NOASSERTION AND NONE. The end user can then understand what work they have to do (see above).

@dabutvin did some work in ClearlyDefined expression parsing and satisfaction down this path.

kestewart commented 5 years ago

Use case from Gary: how know part of the information, but not all, so useful to understand what still need to be done. ie. may know that 20 different license apply, but there is part un-analyzed. So this is a good way to represent. There are other ways to represent.

kestewart commented 5 years ago

Steve ok with if there are use cases in the practice. Key don't want to mandate it as every package has "AND NOASSERTION". We don't want to see this abused. If empty file use case discussed - and it should not require "AND NOASSERTION" at the package level. Ok to put on the file.

Note: not clear about using "OR NOASSERTION" what use case would this represent?

jeffmcaffer commented 5 years ago

OR NOASSERTION, at the very least comes up through typos and the like. For example, 'GPL-3.0 OR MTI. Here the MTI should have been MIT but was mistyped. If we are to mechanically reason about this it would be great to convert it toGPL-3.0 OR NOASSERTION`. That way it just flows through the system like any other expression and in the end the user sees the NOASSERTION and deals with it as described above.

kestewart commented 4 years ago

Moving this to be discussed for 3.0 as it may result in incompatibilities, and assigning to Steve to weigh in on.

swinslow commented 4 years ago

Just commented on similar topic in https://github.com/spdx/spdx-spec/issues/49#issuecomment-652745365:

Reviewing these older issues, I don't see a benefit to adding NOASSERTION into the license expression syntax. It is already defined with a particular meaning in e.g. the Declared License / Concluded License fields.

goneall commented 4 years ago

Just commented on a similar topic in #49:

From a tooling perspective, a license expression for a declared license is commonly constructed through machine analysis of discovered licenses and inserting the appropriate AND and OR's. If during that analysis a license is identified as NONE or NOASSERTION, it would be useful to include that information in the resultant license expression.

If we did not support NONE and NOASSERTION in the license expression, what would be the proposed result in the above scenario? If you declare the entire package or file as NONE or NOASSERTION you loose a lot of valuable information on the discovered licenses. If you don't include the NONE or NOASSERTION than you may miss an important flag that some manual review of the analysis is required.

swinslow commented 4 years ago

Thanks @goneall -- I just commented in the other thread (here) but just to close the loop here too, I'm convinced that the use case you described is worth supporting. I withdraw my earlier objection :)

My only comment in this case (as compared to NONE) is similar to one that I think I voiced when we discussed this on a tech team call last year (see above). For NOASSERTION in particular, I'd encourage that we should make it clear in the drafting that it is not mandatory to use NOASSERTION in license expressions.

Mostly I want to avoid establishing a practice where e.g. any time an SPDX document creator isn't sure that they are accounting for every byte in a file, that they believe it's required to add AND NOASSERTION to every expression. I think that could result in unnecessarily complicating common license expressions with minimal additional benefit. Put another way, I'd generally expect that using AND NOASSERTION or OR NOASSERTION should typically be the exception rather than the rule, and it might be worth explicitly stating that in the spec.

goneall commented 4 years ago

Mostly I want to avoid establishing a practice where e.g. any time an SPDX document creator isn't sure that they are accounting for every byte in a file, that they believe it's required to add AND NOASSERTION to every expression.

Completely agree - if it became a common practice it would loose meaning. If it is used in an expression, it would be a really good practice to include a license comment to describe why.

goneall commented 5 months ago

It looks like the SPDX 3.0 license expression still does not include NOASSERTION. Since this would not be a breaking change, I'm moving this to the 3.1 milestone.