Closed waynebeaton closed 2 years ago
@waynebeaton Interesting issue and discussion. The challenge would be capturing the input for the metadata. The source for all of the license list metadata data is the license-list-XML repo.
If the license-list-XML captured this information, we could easily update the LicenseListPublisher to carry that information forward to the JSON and other formats.
There are 2 challenges with this, however:
If you're interested in pursing this, I would suggest adding an issue to the license-list-XML repo requesting an attributes to the XML to support this use case.
cc'ing @swinslow @zvr - you may be interested in this as well.
I'm approaching this from the perspective of trying to make a best guess determination of the SPDX License Identifier from the results of
scancode-toolkit
. In the absence of an explicit tag, scancode does a fantastic job of identifying the various licenses, but doesn't necessary know how they combine.In the case where a license and an exception are identified by the scanner, it's relatively easy to stitch the two together using
with
. It's less easy to guess whether two licenses identified by the scanner are combined via conjunction or disjunction, but we can at least automate creating a consistent SPDX identifier.In the case where content is multiply licensed and an exception is involved, we can guess to which license the exception belongs, but the guess could be made better if we have a means of identifying to which license an exception likely belongs.
That is, if scancode detects
Apache-2.0
,GPL-2.0-only
, andClasspath-exception-2.0
in content that does not include an SPDX-License-Identifer tag that might otherwise tell us how these are combined, we could reasonably guess that the license might beApache-2.0 and GPL-2.0-only with Classpath-exception-2.0
and notApache-2.0 with Classpath-exception-2.0 and GPL-2.0-only
(whether or notand
is right is a toss up, but we can make a reasonable assumption that the classpath exception applies to the GPL).This specific example feels pretty obvious, but I don't know enough about most of the exceptions to understand whether or not this is something that can be reasonably done consistently by adding a bit of extra metadata to the JSON files for the exceptions that lists the licenses to which the exception might apply.
This is part of a class of problem that I assume others are thinking about, e.g., how to combine and collapse licenses (
GPL-2.0-only and GPL-2.0-or-later
can collapse toGPL-2.0-only
, orGPL-2.0-only+
is basically the same asGPL-2.0-or-later
). Is that discussion happening somewhere?