spdx / spdx-maven-plugin

Plugin for supporting SPDX in a Maven build.
Apache License 2.0
44 stars 26 forks source link

Incorrect "licenseDeclared" as "NOASSERTION" Despite License Visibility on Maven Central #165

Open jaudriga opened 5 months ago

jaudriga commented 5 months ago

Some dependencies are marked with "licenseDeclared": "NOASSERTION" in the SPDX output, even though their licenses are clearly specified on Maven Central. It looks like the plugin tries to use the project POM as a fallback in case no license was found. However, that also does not seem to work.

Here is an example:

For the dependency jakarta.json/jakarta.json-api v2.1.1:

SPDX Excerpt:

...
{
  "SPDXID" : "SPDXRef-gnrtd55",
  "copyrightText" : "UNSPECIFIED",
  "description" : "Jakarta JSON Processing defines a Java(R) based framework for parsing, generating, transforming, and querying JSON documents.",
  "downloadLocation" : "NOASSERTION",
  "externalRefs" : [ {
    "referenceCategory" : "PACKAGE-MANAGER",
    "referenceLocator" : "pkg:maven/jakarta.json/jakarta.json-api@2.1.1",
    "referenceType" : "purl"
  } ],
  "filesAnalyzed" : false,
  "homepage" : "https://github.com/eclipse-ee4j/jsonp",
  "licenseConcluded" : "NOASSERTION",
  "licenseDeclared" : "NOASSERTION",
  "name" : "Jakarta JSON Processing API",
  "originator" : "Organization:Eclipse Foundation",
  "summary" : "Jakarta JSON Processing defines a Java(R) based framework for parsing, generating, transforming, and querying JSON documents.",
  "versionInfo" : "2.1.1"
},
...

Maven createSPDX target Output shows a bunch of warnings that are likely not related:

[INFO] --- spdx:0.7.3:createSPDX (build-spdx) @ auth ---
[INFO] spdx file type = .json
[INFO] Creating SPDX File /home/user/repos/scim/kmbw-scim/auth/target/site/de.dataport.scim_auth-1.3.2.spdx.json
[WARNING] The following errors were found in the SPDX file:
 Relationship error: Relationship error: GPL-2.0-with-classpath-exception is deprecated. in jersey-media-json-binding in auth in auth in auth
 Relationship error: Relationship error: Relationship error: Relationship error: GPL-2.0-with-classpath-exception is deprecated. in jersey-core-client in jersey-core-server in jersey-core-server in jersey-container-servlet-core in jersey-container-servlet-core in auth in auth in auth
 Relationship error: Relationship error: Relationship error: GPL-2.0-with-classpath-exception is deprecated. in jersey-core-server in jersey-container-servlet-core in jersey-container-servlet-core in auth in auth in auth
 Relationship error: Relationship error: Relationship error: Relationship error: GPL-2.0-with-classpath-exception is deprecated. in Jakarta Annotations API in jersey-core-common in jersey-core-common in jersey-container-servlet-core in jersey-container-servlet-core in auth in auth in auth
 Relationship error: Relationship error: Relationship error: Relationship error: GPL-2.0-with-classpath-exception is deprecated. in OSGi resource locator in jersey-core-common in jersey-core-common in jersey-container-servlet-core in jersey-container-servlet-core in auth in auth in auth
 Relationship error: Relationship error: Relationship error: GPL-2.0-with-classpath-exception is deprecated. in jersey-core-common in jersey-container-servlet-core in jersey-container-servlet-core in auth in auth in auth
 Relationship error: Relationship error: GPL-2.0-with-classpath-exception is deprecated. in jersey-container-servlet-core in auth in auth in auth
 Relationship error: Relationship error: GPL-2.0-with-classpath-exception is deprecated. in Jakarta Servlet in auth in auth in auth
 Relationship error: Relationship error: GPL-2.0-with-classpath-exception is deprecated. in Jakarta RESTful WS API in auth in auth in auth
 Relationship error: Relationship error: Relationship error: Relationship error: GPL-2.0-with-classpath-exception is deprecated. in aopalliance version 1.0 repackaged as a module in ServiceLocator Default Implementation in ServiceLocator Default Implementation in jersey-inject-hk2 in jersey-inject-hk2 in auth in auth in auth
 Relationship error: Relationship error: Relationship error: Relationship error: GPL-2.0-with-classpath-exception is deprecated. in HK2 API module in ServiceLocator Default Implementation in ServiceLocator Default Implementation in jersey-inject-hk2 in jersey-inject-hk2 in auth in auth in auth
 Relationship error: Relationship error: Relationship error: Relationship error: GPL-2.0-with-classpath-exception is deprecated. in HK2 Implementation Utilities in ServiceLocator Default Implementation in ServiceLocator Default Implementation in jersey-inject-hk2 in jersey-inject-hk2 in auth in auth in auth
 Relationship error: Relationship error: Relationship error: GPL-2.0-with-classpath-exception is deprecated. in ServiceLocator Default Implementation in jersey-inject-hk2 in jersey-inject-hk2 in auth in auth in auth
 Relationship error: Relationship error: GPL-2.0-with-classpath-exception is deprecated. in jersey-inject-hk2 in auth in auth in auth
 License list version does not match the pattern M.N

I was also unable to find a workaround to manually state the licenses for dependencies for which the license is listed as NOASSERTION.

goneall commented 5 months ago

Thanks @jaudriga for reporting this.

I did some analysis and the reasons the 2 licenses don't show up in declared is they do not match the SPDX license based on the current algorithm.

The algorithm uses the URL's for the license. If they are found in the seeAlso property of the listed license (a.k.a. "other web pages for the icense"), it will match.

In looking at the above example, the URL for EPL 2.0 was on the SPDX EPL-2.0 license.

It does not match based on the name since the names are highly variable and there are several cases where two distinct licenses use the same name.

One possible partial solution would be to create a non-listed license with the name from the POM file license element and the text pointing to the URL. This would be better than the current NOASSERTION.

jaudriga commented 5 months ago

Thanks for the quick reply! I just checked for my own understanding.

The POM file I linked contains:

<url>https://projects.eclipse.org/license/epl-2.0</url>

While the current algorithm expects:

The two URLs do not match the URL defined in the POM (same for GPL-2.0-with-classpath-exception).

I would like to try the workaround suggestion, but could you be more specific on how to try this? I tried the following configuration, but it does not seem to work:

<configuration>
...
    <nonStandardLicenses>
        <nonStandardLicense>
            <licenseId>LicenseRef-epl-workaround</licenseId>
            <name>Eclipse Public License 2.0</name>
            <extractedText>https://projects.eclipse.org/license/epl-2.0</extractedText>
            <crossReference>
                <crossReference>https://projects.eclipse.org/license/epl-2.0</crossReference>
            </crossReference>
            <comment>Work around issue https://github.com/spdx/spdx-maven-plugin/issues/165</comment>
        </nonStandardLicense>
    </nonStandardLicenses>
</configuration>
goneall commented 5 months ago

Thanks for the quick reply! I just checked for my own understanding.

The POM file I linked contains:

<url>https://projects.eclipse.org/license/epl-2.0</url>

While the current algorithm expects:

The two URLs do not match the URL defined in the POM (same for GPL-2.0-with-classpath-exception).

Correct

I would like to try the workaround suggestion, but could you be more specific on how to try this? I tried the following configuration, but it does not seem to work:


<configuration>
...
    <nonStandardLicenses>
        <nonStandardLicense>
            <licenseId>LicenseRef-epl-workaround</licenseId>
            <name>Eclipse Public License 2.0</name>
            <extractedText>https://projects.eclipse.org/license/epl-2.0</extractedText>

Since text is expected, I would suggest adding a short description like:

<extractedText>The text for the license can be found at https://projects.eclipse.org/license/epl-2.0</extractedText>

        <crossReference>
            <crossReference>https://projects.eclipse.org/license/epl-2.0</crossReference>

Not 100% sure if the crossReference field is supported in the POM file - if so, this should be fine

        </crossReference>
        <comment>Work around issue https://github.com/spdx/spdx-maven-plugin/issues/165</comment>
    </nonStandardLicense>
</nonStandardLicenses>

Other than the above comments, this looks fine.

One other solution is to create a pull request for the license XML for the Eclipse Public License to add the additional URL reference. It would take a couple of months before it gets published, but it would be a good long term solution.

jaudriga commented 5 months ago

if so, this should be fine

The problem is that I tried it with and without the crossReference and both times it still shows "NOASSERTION" for jakarta.json/jakarta.json-api .

Do you see a workaround that would fix this? I assume this means that crossReference is not supported in the POM file?


This is the result in the SPDX when I use the workaround:

  "hasExtractedLicensingInfos" : [ {
    "licenseId" : "LicenseRef-epl-workaround",
    "comment" : "Work around issue https://github.com/spdx/spdx-maven-plugin/issues/165",
    "extractedText" : "The text for the license can be found at https://projects.eclipse.org/license/epl-2.0",
    "name" : "Eclipse Public License 2.0",
    "seeAlsos" : [ "https://projects.eclipse.org/license/epl-2.0" ]
  } ],
goneall commented 5 months ago

I think the issue is the crossReference for the hasExtractedLicensingInfos isn't being checked - only the licenses from the listed licenses.

You could add a licenseDeclared configuration parameter for the package that references LicenseRef-epl-workaround

jaudriga commented 5 months ago

You could add a licenseDeclared configuration parameter for the package that references LicenseRef-epl-workaround

Sadly I am unable to find out how to do that. There does not seem to be a config option for packages. Only for for external references.

goneall commented 5 months ago

@jaudriga my apologies, I referred to the wrong configuration parameter.

The licenseConcluded configuration parameter should allow you to define a license expression where you should be able to use the licenseRef.

goneall commented 5 months ago

Here's a pointer to the code that implements the parameter - let me know if this works for you.

jaudriga commented 5 months ago

The licenseConcluded configuration parameter should allow you to define a license expression where you should be able to use the licenseRef.

Mhh.. I do not understand how to add any configuration parameter for a dependency/package. All the examples that I found either use pathsWithSpecificSpdxInfo to override the defaults for certain source files or directories or change the parameter at the project-level. So sadly this still does not work for me.

Just to reiterate: I would like the resulting SBOM to either list the correct license of jakarta.json-api or list the workaround license with the ID "LicenseRef-epl-workaround".

goneall commented 5 months ago

Mhh.. I do not understand how to add any configuration parameter for a dependency/package.

Ahh - now I understand the issue - you are trying to define the license for a dependency in a POM file which declares the dependency. Without the ability to change the POM file for the dependency itself, my proposed solution will not work.

I can think of two solutions:

We could add configuration to allow for this. Feel free to comment on any ideas how such a configuration would work.

jaudriga commented 5 months ago

Thanks! That clarifies things.

create a pull request for the license XML for the Eclipse Public License to add the additional URL reference. It would take a couple of months before it gets published, but it would be a good long term solution.

One last point regarding this. By now I have looked a bit deeper into all the dependencies that have "NOASSERTION" as "licenseDeclared". I see the following cases for which merely updating the license XML would not be enough:

Both cases can be observed for quite popular libraries. In Maven Central the license is correctly displayed though.

I am doubtful that merely changing the license XML is a good long term solution.

goneall commented 5 months ago

@jaudriga - thanks for the additional analysis.

For the additional cases you mentioned above, do you see any reliable means of determining the license from the POM files? If so, we can enhance the plugin to use that approach. I'm not sure how Maven Central determines the license - if their mechanism is reliable and open, we could adopt that approach.

With the additional analysis, it doesn't sound like adding configuration parameters is a good approach either - it would be ideal if the POM files contained reliably parseable license information we could use.

Other package managers (e.g. NPM) have adopted SPDX license identifiers in their metadata which results in a reliable machine readable format. If this was adopted in Maven, life would be much simpler for those of us wanting to create SBOMs with accurate license information (IMHO).

goneall commented 5 months ago

@hboutemy - let us know if there is anything we can do as a workaround or enhancement to the plugin to get similar results as Maven Central on the dependency licenses.

jaudriga commented 4 months ago

For the additional cases you mentioned above, do you see any reliable means of determining the license from the POM files?

Maybe:

However, I would say that even though this was observed in popular libraries, those are uncommon ways of specifying the license in a POM. It is unclear to me if the implementation should have built-in support for that. Adding a new configuration parameter to allow specifying licenses only for certain packages may be the better solution here. There will most likely be other edge cases out there for which the Maven Plugin will not be able to determine a license. So it looks to me like having a configuration option will be beneficial anyway.

I'm not sure how Maven Central determines the license - if their mechanism is reliable and open, we could adopt that approach.

For each project that I looked at the license appears to be correctly listed in Maven Central. One may also want to resort to retrieving information from there instead of retrieving it from the POM file.