google / deps.dev

Resources for the deps.dev API
https://deps.dev
Apache License 2.0
258 stars 20 forks source link

Details for non-standard licenses #62

Closed sgustafsson closed 4 months ago

sgustafsson commented 9 months ago

The FAQ page says:

We identify licenses as SPDX expressions. When there is no associated SPDX identifier, we indicate the license is non-standard. When we are unable to obtain license information, we indicate the license is unknown.

Is there a way, via the APIs, to see which identified license values were the reason for the tool to indicate non-standard ?

Or could the API be enhanced, instead of:

"licenses": [
    "non-standard"
  ],

it would return

  "licenses": {
      "spdx" : [],
      "non-standard" : ["Apache License 2"]
  }  

In case of an identified SPDX id it would return

  "licenses": {
      "spdx" : ["Apache-2.0"],
      "non-standard" : []
  }
sgustafsson commented 8 months ago

I just came across this maven artifact

<dependency>
    <groupId>com.planbase.pdf</groupId>
    <artifactId>PdfLayoutMgr2</artifactId>
    <version>2.4.9</version>
</dependency>

which does not have an associated SPDX identifier - I think. The pom file says:

<licenses>
  <license>
    <name>GNU Affero General Public License (AGPL) version 3.0</name>
    <url>https://www.gnu.org/licenses/agpl-3.0.en.html</url>
  </license>
</licenses> 

The license name "GNU Affero General Public License (AGPL) version 3.0" is not the official SPDX name, which is "GNU Affero General Public License v3.0 only".

I would have expected that this would result in a non-standard license, but you actually list the AGPL-3.0 license at https://deps.dev/maven/com.planbase.pdf%3APdfLayoutMgr2 and via the API:

{
  "versionKey": {
    "system": "MAVEN",
    "name": "com.planbase.pdf:PdfLayoutMgr2",
    "version": "2.4.9"
  },
  "purl": "",
  "publishedAt": "2020-10-06T17:41:02Z",
  "isDefault": true,
  "licenses": [
    "AGPL-3.0"
  ],

Can you comment on this case, and how you identified AGPL-3.0 ?

sarnesjo commented 8 months ago

Hi @sgustafsson! We do map licenses to SDPX expressions when completely unambiguous. For example, these are some of the licenses that get mapped (ignoring case) to AGPL-3.0:

Our mappings consist of a few hundred entries that we populate by looking at the most common licenses across our entire corpus and manually trim down to just the ones that are unambiguous. We don't want to get licensing wrong, so our policy is to err on the side of caution. The majority of the licenses that we don't map, and that end up presented as non-standard, are ambiguous ones like "BSD" (does it mean BSD-2-Clause or BSD-3-Clause or one of the other variants?) and "GPL" (GPL-2.0?GPL-3.0? etc).

With that said, presenting the original licenses would be useful in a lot of cases. I'll add it to the API backlog!

sgustafsson commented 8 months ago

Thanks! Being challenged myself with ambigious and non-ambigious license names and expressions, I sometimes use the impliedNames from https://github.com/maxhbr/LDBcollector. The "generated" branch contains useful metadata around licenses, for example

https://github.com/maxhbr/LDBcollector/blob/generated/json/AGPL-3.0-only.pretty.json

{
    "__impliedNames": [
        "AGPL-3.0-only",
        "AGPL-3.0",
        "AGPL3.0",
        "AGPL3",
        "AGPL (v3)",
        "Affero General Public License 3.0",
        "GNU AFFERO GENERAL PUBLIC LICENSE Version 3",
        "GNU Affero General Public License (AGPL-3.0) (v. 3.0)",
        "AGPL-3.0-or-later",
        "AGPL-3.0+",
        "AGPL3.0+",
        "AGPL3+",
        "AGPL (v3 or later)",
        "Affero General Public License 3.0 or later",
        "GNU Affero General Public License v3.0 only",
        "GNU Affero General Public License v3.0 or later",
        "agpl-3.0",
        "GNU AGPLv3",
        "GNU Affero General Public License 3.0 (or any later version)",
        "GNU Affero General Public License 3.0",
        "GNU Affero General Public License v3",
        "agpl-v3",
        "GNU AFFERO GENERAL PUBLIC LICENSE, Version 3 (AGPL-3.0)",
        "License :: OSI Approved :: GNU Affero General Public License v3",
        "scancode://agpl-3.0-plus",
        "AGPL 3.0 or later",
        "scancode://agpl-3.0",
        "AGPL 3.0"
    ],
    "__impliedId": "AGPL-3.0-only",
    "__isFsfFree": true,
    "__impliedAmbiguousNames": [
        "Affero General Public License",
        "GNU AFFERO GENERAL PUBLIC LICENSE (AGPL-3)",
        "AGPLv3",
        "AGPLv3+",
        "GNU Affero General Public License, Version 3.0+",
        "GNU AFFERO GENERAL PUBLIC LICENSE Version 3+",
        "GNU AFFERO GENERAL PUBLIC LICENSE v3+",
        "GNU AFFERO GENERAL PUBLIC LICENSE, version 3.0+",
        "GNU Affero General Public License (AGPL) v3+",
        "GNU Affero General Public License (AGPL) version 3.0+",
        "GNU Affero General Public License 3.0+",
        "GNU Affero General Public License Version 3+",
        "GNU Affero General Public License v3+",
        "GNU Affero General Public License v3 or later",
        "GNU Affero General Public License v3.0+",
        "GNU Affero General Public License v3.0 or later",
        "GNU Affero General Public License version 3+",
        "GNU Affero General Public License version 3 or later",
        "GNU Affero General Public License version 3.0+",
        "GNU Affero General Public License version 3.0 or later",
        "GNU Affero General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version",
        "GNU AGPL version 3 or any later version",
        "Affero GPLv3+",
        "GNO Affero GPL v3.0+",
        "AGPL v3+",
        "AGPL-3.0+",
        "AGPL-V3+",
        "AGPL-v3+",
        "AGPL3+",
        "Affero GPL3+",
        "General Affero Public License version 3+",
        "Affero GNU Public License Version 3+",
        "version 2 of the Affero General Public License",
        "scancode:agpl-3.0-plus",
        "GNU AFFERO GENERAL PUBLIC LICENSE Version 3",
        "GNU AFFERO GENERAL PUBLIC LICENSE v3",
        "GNU AFFERO GENERAL PUBLIC LICENSE, version 3",
        "GNU AFFERO PUBLIC LICENSE, Version 3",
        "GNU Affero General Public License (AGPL) v3",
        "GNU Affero General Public License (AGPL) version 3.0",
        "GNU Affero General Public License 3.0",
        "GNU Affero General Public License Version 3",
        "GNU Affero General Public License v3",
        "GNU Affero General Public License, version 3",
        "GNU Afferp General Public License (AGPL), Version 3.0",
        "Affero GPLv3",
        "GNO Affero GPL v3.0",
        "AGPL v3",
        "AGPL 3.0",
        "AGPL-V3",
        "AGPL-v3",
        "AGPL3",
        "Affero GPL3",
        "General Affero Public License version 3",
        "Affero GNU Public License Version 3",
        "AGPL-3",
        "scancode:agpl-3.0",
        "osi:AGPL-3.0"
    ],
kenwark commented 5 months ago

I've seen a fair number of non-standard license expressions such as "CC-BY-4.0 AND MIT AND OFL-1.1". So I suggest that the "non-standard" field in the JSON be a string and not an array. For example:

 "licenses": {
      "spdx" : [],
      "non-standard" : "CC-BY-4.0 AND MIT AND OFL-1.1"
  }  

or

 "licenses": {
      "spdx" : [],
      "non-standard" : "Something-I-Made-Up OR OFL-1.1"
  }  

My reasoning is that you can't anticipate the weird variations of non-standard licenses and there's no point in creating a structure that represents arbitrary boolean expressions.

sarnesjo commented 5 months ago

Hi @kenwark!

License expressions like those are, in fact, "standard". Specifically, they follow the standard described by SPDX 2.1, appendix IV. (And later versions of the spec, but that's the specific one our parser is based on.)

Also, to be clear, deps.dev does not create those more elaborate expressions (containing AND or OR), only validate and normalize their syntax. If you see one of them, it's because the package author chose that license. Example:

kenwark commented 5 months ago

@sarnesjo thanks for your reply. Just to verify, the existing list of SPDX license IDs is an implicit disjunction (implicit 'or'), and there is no ability in the current spec to represent license expressions that use 'AND' or 'WITH' even if all the license ids are 'standard'?

To be honest I think license expressions that use 'AND' should be banned! It's bad enough dealing with the proliferation of open source licenses. Adding the complexity of arbitrary expressions even if they conform to the SPDX spec should be strongly discouraged. Sorry for my little rant.

sarnesjo commented 5 months ago

Hi @kenwark! There's a chance I'm misunderstanding your question, so to avoid getting the behavior of the deps.dev API and the rules of the SPDX spec mixed up, let me try to answer with a few examples:

kenwark commented 5 months ago

Okay, I finally understand that the contents of the 'licenses' array can be arbitrary SPDX expressions where license IDs that you are not able to match against known SPDX IDs are replaced with 'non-standard'.

By the way, my experience with Maven artifacts is that there is an implicit 'OR' for multi-license projects.

For example, this link for a Maven component produces "licenses":["EPL-2.0", "GPL-2.0", "LGPL-2.1"] The original license file for the package states:

Eclipse Public License version 2.0 OR GNU General Public License version 2 OR GNU Lesser General Public License version 2.1

In general, my understanding from having discussed the issue of multiple licenses with a lawyer many moons ago is that it is problematic to combine arbitrary licenses using 'AND', so my personal belief is that 'license expressions' need to go away.

sarnesjo-google commented 4 months ago

Hi @sgustafsson and @kenwark! The v3alpha API now has an additional field, licenseDetails, which reports licenses both as specified by the package author in the package metadata (or, for Go, as determined by the licensecheck package) and mapped to SPDX.

As an example, let's say the metadata for some package version contains these three licenses:

The deps.dev v3alpha API now reports:

// same as before
"licenses": [
  "Apache-2.0",
  "non-standard"
],

// new
"licenseDetails": [
  {
    "license": "Apache License, Version 2.0",
    "spdx": "Apache-2.0"
  },
  {
    "license": "Apache License Version 2.0, January 2004",
    "spdx": "Apache-2.0"
  },
  {
    "license": "FooBar Proprietary License",
    "spdx": "non-standard"
  }
],

We've also updated the package_lock_licenses example to show one way of working with this data (preferring SPDX, additionally using the original license if needed).

Please take a look and let us know if you have any comments or find any issues.

sgustafsson commented 4 months ago

Works as expected for me.

I now get more information (Eclipse Public License in this example) where I previously only got non-standard

{
  "versionKey": {
    "system": "MAVEN",
    "name": "org.everrest:everrest-core",
    "version": "1.15.0"
  },
  "purl": "pkg:maven/org.everrest/everrest-core@1.15.0",
  "publishedAt": "2021-08-19T10:58:22Z",
  "isDefault": true,
  "isDeprecated": false,
  "licenses": [
    "non-standard"
  ],
  "licenseDetails": [
    {
      "license": "Eclipse Public License",
      "spdx": "non-standard"
    }
  ]
sarnesjo-google commented 4 months ago

Great! And, to use this as an example, the reason deps.dev doesn't map "Eclipse Public License" to an SDPX expression, is that it's ambiguous whether it's EPL-1.0 or EPL-2.0.