Closed sgustafsson closed 4 months ago
I just came across this maven artifact
<dependency>
<groupId>com.planbase.pdf</groupId>
<artifactId>PdfLayoutMgr2</artifactId>
<version>2.4.9</version>
</dependency>
which does not have an associated SPDX identifier - I think. The pom file says:
<licenses>
<license>
<name>GNU Affero General Public License (AGPL) version 3.0</name>
<url>https://www.gnu.org/licenses/agpl-3.0.en.html</url>
</license>
</licenses>
The license name "GNU Affero General Public License (AGPL) version 3.0" is not the official SPDX name, which is "GNU Affero General Public License v3.0 only".
I would have expected that this would result in a non-standard license, but you actually list the AGPL-3.0 license at https://deps.dev/maven/com.planbase.pdf%3APdfLayoutMgr2 and via the API:
{
"versionKey": {
"system": "MAVEN",
"name": "com.planbase.pdf:PdfLayoutMgr2",
"version": "2.4.9"
},
"purl": "",
"publishedAt": "2020-10-06T17:41:02Z",
"isDefault": true,
"licenses": [
"AGPL-3.0"
],
Can you comment on this case, and how you identified AGPL-3.0 ?
Hi @sgustafsson! We do map licenses to SDPX expressions when completely unambiguous. For example, these are some of the licenses that get mapped (ignoring case) to AGPL-3.0
:
gnu affero general public licence, version 3.0
gnu affero general public license (agpl) version 3.0
gnu affero general public license (agpl), version 3
Our mappings consist of a few hundred entries that we populate by looking at the most common licenses across our entire corpus and manually trim down to just the ones that are unambiguous. We don't want to get licensing wrong, so our policy is to err on the side of caution. The majority of the licenses that we don't map, and that end up presented as non-standard
, are ambiguous ones like "BSD" (does it mean BSD-2-Clause
or BSD-3-Clause
or one of the other variants?) and "GPL" (GPL-2.0
?GPL-3.0
? etc).
With that said, presenting the original licenses would be useful in a lot of cases. I'll add it to the API backlog!
Thanks! Being challenged myself with ambigious and non-ambigious license names and expressions, I sometimes use the impliedNames from https://github.com/maxhbr/LDBcollector. The "generated" branch contains useful metadata around licenses, for example
https://github.com/maxhbr/LDBcollector/blob/generated/json/AGPL-3.0-only.pretty.json
{
"__impliedNames": [
"AGPL-3.0-only",
"AGPL-3.0",
"AGPL3.0",
"AGPL3",
"AGPL (v3)",
"Affero General Public License 3.0",
"GNU AFFERO GENERAL PUBLIC LICENSE Version 3",
"GNU Affero General Public License (AGPL-3.0) (v. 3.0)",
"AGPL-3.0-or-later",
"AGPL-3.0+",
"AGPL3.0+",
"AGPL3+",
"AGPL (v3 or later)",
"Affero General Public License 3.0 or later",
"GNU Affero General Public License v3.0 only",
"GNU Affero General Public License v3.0 or later",
"agpl-3.0",
"GNU AGPLv3",
"GNU Affero General Public License 3.0 (or any later version)",
"GNU Affero General Public License 3.0",
"GNU Affero General Public License v3",
"agpl-v3",
"GNU AFFERO GENERAL PUBLIC LICENSE, Version 3 (AGPL-3.0)",
"License :: OSI Approved :: GNU Affero General Public License v3",
"scancode://agpl-3.0-plus",
"AGPL 3.0 or later",
"scancode://agpl-3.0",
"AGPL 3.0"
],
"__impliedId": "AGPL-3.0-only",
"__isFsfFree": true,
"__impliedAmbiguousNames": [
"Affero General Public License",
"GNU AFFERO GENERAL PUBLIC LICENSE (AGPL-3)",
"AGPLv3",
"AGPLv3+",
"GNU Affero General Public License, Version 3.0+",
"GNU AFFERO GENERAL PUBLIC LICENSE Version 3+",
"GNU AFFERO GENERAL PUBLIC LICENSE v3+",
"GNU AFFERO GENERAL PUBLIC LICENSE, version 3.0+",
"GNU Affero General Public License (AGPL) v3+",
"GNU Affero General Public License (AGPL) version 3.0+",
"GNU Affero General Public License 3.0+",
"GNU Affero General Public License Version 3+",
"GNU Affero General Public License v3+",
"GNU Affero General Public License v3 or later",
"GNU Affero General Public License v3.0+",
"GNU Affero General Public License v3.0 or later",
"GNU Affero General Public License version 3+",
"GNU Affero General Public License version 3 or later",
"GNU Affero General Public License version 3.0+",
"GNU Affero General Public License version 3.0 or later",
"GNU Affero General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version",
"GNU AGPL version 3 or any later version",
"Affero GPLv3+",
"GNO Affero GPL v3.0+",
"AGPL v3+",
"AGPL-3.0+",
"AGPL-V3+",
"AGPL-v3+",
"AGPL3+",
"Affero GPL3+",
"General Affero Public License version 3+",
"Affero GNU Public License Version 3+",
"version 2 of the Affero General Public License",
"scancode:agpl-3.0-plus",
"GNU AFFERO GENERAL PUBLIC LICENSE Version 3",
"GNU AFFERO GENERAL PUBLIC LICENSE v3",
"GNU AFFERO GENERAL PUBLIC LICENSE, version 3",
"GNU AFFERO PUBLIC LICENSE, Version 3",
"GNU Affero General Public License (AGPL) v3",
"GNU Affero General Public License (AGPL) version 3.0",
"GNU Affero General Public License 3.0",
"GNU Affero General Public License Version 3",
"GNU Affero General Public License v3",
"GNU Affero General Public License, version 3",
"GNU Afferp General Public License (AGPL), Version 3.0",
"Affero GPLv3",
"GNO Affero GPL v3.0",
"AGPL v3",
"AGPL 3.0",
"AGPL-V3",
"AGPL-v3",
"AGPL3",
"Affero GPL3",
"General Affero Public License version 3",
"Affero GNU Public License Version 3",
"AGPL-3",
"scancode:agpl-3.0",
"osi:AGPL-3.0"
],
I've seen a fair number of non-standard license expressions such as "CC-BY-4.0 AND MIT AND OFL-1.1". So I suggest that the "non-standard" field in the JSON be a string and not an array. For example:
"licenses": {
"spdx" : [],
"non-standard" : "CC-BY-4.0 AND MIT AND OFL-1.1"
}
or
"licenses": {
"spdx" : [],
"non-standard" : "Something-I-Made-Up OR OFL-1.1"
}
My reasoning is that you can't anticipate the weird variations of non-standard licenses and there's no point in creating a structure that represents arbitrary boolean expressions.
Hi @kenwark!
License expressions like those are, in fact, "standard". Specifically, they follow the standard described by SPDX 2.1, appendix IV. (And later versions of the spec, but that's the specific one our parser is based on.)
Also, to be clear, deps.dev does not create those more elaborate expressions (containing AND
or OR
), only validate and normalize their syntax. If you see one of them, it's because the package author chose that license. Example:
(BSD-3-Clause AND Apache-2.0)
Apache-2.0 AND BSD-3-Clause
@sarnesjo thanks for your reply. Just to verify, the existing list of SPDX license IDs is an implicit disjunction (implicit 'or'), and there is no ability in the current spec to represent license expressions that use 'AND' or 'WITH' even if all the license ids are 'standard'?
To be honest I think license expressions that use 'AND' should be banned! It's bad enough dealing with the proliferation of open source licenses. Adding the complexity of arbitrary expressions even if they conform to the SPDX spec should be strongly discouraged. Sorry for my little rant.
Hi @kenwark! There's a chance I'm misunderstanding your question, so to avoid getting the behavior of the deps.dev API and the rules of the SPDX spec mixed up, let me try to answer with a few examples:
"Apache-2.0 AND BSD-3-Clause"
or "Apache-2.0 OR BSD-3-Clause"
in a response from deps.dev, that's because the original package author put that (or at least an equivalent expression, possibly with different capitalization and ordering) in the package metadata. deps.dev does not construct license expressions containing AND
or OR
.["Apache-2.0", "BSD-3-Clause"]
, in a response from deps.dev, that means that deps.dev detected multiple licenses when scanning the package. This can currently only happen for Go and Maven, although that could change in the future. Please note that in this case, there is neither an implicit AND
nor an implicit OR
; as mentioned in the API docs, "when more than one license is listed, their relationship is unspecified".Okay, I finally understand that the contents of the 'licenses' array can be arbitrary SPDX expressions where license IDs that you are not able to match against known SPDX IDs are replaced with 'non-standard'.
By the way, my experience with Maven artifacts is that there is an implicit 'OR' for multi-license projects.
For example, this link for a Maven component produces "licenses":["EPL-2.0", "GPL-2.0", "LGPL-2.1"]
The original license file for the package states:
Eclipse Public License version 2.0 OR GNU General Public License version 2 OR GNU Lesser General Public License version 2.1
In general, my understanding from having discussed the issue of multiple licenses with a lawyer many moons ago is that it is problematic to combine arbitrary licenses using 'AND', so my personal belief is that 'license expressions' need to go away.
Hi @sgustafsson and @kenwark! The v3alpha API now has an additional field, licenseDetails
, which reports licenses both as specified by the package author in the package metadata (or, for Go, as determined by the licensecheck package) and mapped to SPDX.
As an example, let's say the metadata for some package version contains these three licenses:
"Apache License, Version 2.0"
"Apache License Version 2.0, January 2004"
"FooBar Proprietary License"
The deps.dev v3alpha API now reports:
// same as before
"licenses": [
"Apache-2.0",
"non-standard"
],
// new
"licenseDetails": [
{
"license": "Apache License, Version 2.0",
"spdx": "Apache-2.0"
},
{
"license": "Apache License Version 2.0, January 2004",
"spdx": "Apache-2.0"
},
{
"license": "FooBar Proprietary License",
"spdx": "non-standard"
}
],
We've also updated the package_lock_licenses
example to show one way of working with this data (preferring SPDX, additionally using the original license if needed).
Please take a look and let us know if you have any comments or find any issues.
Works as expected for me.
I now get more information (Eclipse Public License in this example) where I previously only got non-standard
{
"versionKey": {
"system": "MAVEN",
"name": "org.everrest:everrest-core",
"version": "1.15.0"
},
"purl": "pkg:maven/org.everrest/everrest-core@1.15.0",
"publishedAt": "2021-08-19T10:58:22Z",
"isDefault": true,
"isDeprecated": false,
"licenses": [
"non-standard"
],
"licenseDetails": [
{
"license": "Eclipse Public License",
"spdx": "non-standard"
}
]
The FAQ page says:
Is there a way, via the APIs, to see which identified license values were the reason for the tool to indicate
non-standard
?Or could the API be enhanced, instead of:
it would return
In case of an identified SPDX id it would return