spdx / LicenseListPublisher

Tool that generates license data found in the license-list-data repository from the license-list-XML source
Apache License 2.0
11 stars 18 forks source link

Record the licenses to which an exception may apply #132

Closed waynebeaton closed 2 years ago

waynebeaton commented 2 years ago

I'm approaching this from the perspective of trying to make a best guess determination of the SPDX License Identifier from the results of scancode-toolkit. In the absence of an explicit tag, scancode does a fantastic job of identifying the various licenses, but doesn't necessary know how they combine.

In the case where a license and an exception are identified by the scanner, it's relatively easy to stitch the two together using with. It's less easy to guess whether two licenses identified by the scanner are combined via conjunction or disjunction, but we can at least automate creating a consistent SPDX identifier.

In the case where content is multiply licensed and an exception is involved, we can guess to which license the exception belongs, but the guess could be made better if we have a means of identifying to which license an exception likely belongs.

That is, if scancode detects Apache-2.0, GPL-2.0-only, and Classpath-exception-2.0 in content that does not include an SPDX-License-Identifer tag that might otherwise tell us how these are combined, we could reasonably guess that the license might be Apache-2.0 and GPL-2.0-only with Classpath-exception-2.0 and not Apache-2.0 with Classpath-exception-2.0 and GPL-2.0-only (whether or not and is right is a toss up, but we can make a reasonable assumption that the classpath exception applies to the GPL).

This specific example feels pretty obvious, but I don't know enough about most of the exceptions to understand whether or not this is something that can be reasonably done consistently by adding a bit of extra metadata to the JSON files for the exceptions that lists the licenses to which the exception might apply.

This is part of a class of problem that I assume others are thinking about, e.g., how to combine and collapse licenses (GPL-2.0-only and GPL-2.0-or-later can collapse to GPL-2.0-only, or GPL-2.0-only+ is basically the same as GPL-2.0-or-later). Is that discussion happening somewhere?

goneall commented 2 years ago

@waynebeaton Interesting issue and discussion. The challenge would be capturing the input for the metadata. The source for all of the license list metadata data is the license-list-XML repo.

If the license-list-XML captured this information, we could easily update the LicenseListPublisher to carry that information forward to the JSON and other formats.

There are 2 challenges with this, however:

If you're interested in pursing this, I would suggest adding an issue to the license-list-XML repo requesting an attributes to the XML to support this use case.

cc'ing @swinslow @zvr - you may be interested in this as well.

waynebeaton commented 2 years ago

Moved to https://github.com/spdx/license-list-XML/issues/1457