pypa / trove-classifiers

Canonical source for classifiers on PyPI.
https://pypi.org/p/trove-classifiers/
Apache License 2.0
141 stars 96 forks source link

Request: more precise license classifiers #17

Open Steap opened 6 years ago

Steap commented 6 years ago

Hello,

Going throught the list of available classifiers at https://pypi.python.org/pypi?%3Aaction=list_classifiers , I feel like some of the license classifiers are not precise enough. For instance, there is a "License :: OSI Approved :: BSD License" that could refer to multiple licenses: BSD-2-Clause, BSD-2-Clause-Patent, BSD-3-Clause. In order to determine the actual license used by a project that only specifies "License :: OSI Approved :: BSD License", one has to look at the LICENSE file distributed with the source code.

This is an issue for downstream package maintainers for two reasons:

I think the following licenses should be added (if possible to both pypi-legacy and warehouse):

In parentheses are the spdx identifiers (see https://spdx.org/licenses/) except for LGPL* where I used identifiers similar to those currently used for the various versions of the GPL.

Regarding the LGPL classifiers, we may also state that v2 and v2+ (currently in the list of valid classifiers) refer to v2.0 and v2.0+ and not to v2.1 and v2.1+, which would remove the need for the LGPLv2 and LGPLv2.0+ classifiers.

I decided not to include less used variants of the BSD licences - they may be added in the future if need be.

What do you think about this?

di commented 6 years ago

Thanks for the report, @Steap. We're aware that the licenses are not fine-gained as they could be.

~Right now the existing classifiers are shared between pypi-legacy and Warehouse, and our current priority is to achieve feature parity between the two so we can shut down legacy. As such, I've added this to a post-launch milestone.~ (this is done)

~We need to ship a method to deprecate existing license classifiers (see https://github.com/pypa/pypi-legacy/issues/91) and possibly pypa/warehouse#2649 as well before we can tackle adding more fine-grained and accurate licenses, as well.~ (this is done)

di commented 6 years ago

~Blocked on pypa/warehouse#3628.~ done

di commented 6 years ago

Per https://github.com/pypa/pypi-legacy/issues/91 we should also add:

License :: OSI Approved :: Apache License, Version 2.0 (Apache-2.0)
License :: Apache License, Version 1.1 (Apache-1.1)
License :: Apache License, Version 1.0 (Apache-1.0)

And deprecate:

License :: OSI Approved :: Apache Software License
di commented 6 years ago

This issue is now unblocked. After compiling all the differences between the following sources:

I think this is what needs done (red will be deprecated, green will be added):

Academic

- License :: OSI Approved :: Academic Free License (AFL)
+ License :: OSI Approved :: Academic Free License 1.1 (AFL-1.1)
+ License :: OSI Approved :: Academic Free License 1.2 (AFL-1.2)
+ License :: OSI Approved :: Academic Free License 2.0 (AFL-2.0)
+ License :: OSI Approved :: Academic Free License 2.1 (AFL-2.1)
+ License :: OSI Approved :: Academic Free License 3.0 (AFL-3.0)

Apache

- License :: OSI Approved :: Apache Software License
+ License :: Apache License, Version 1.1 (Apache-1.1)
+ License :: Apache License, Version 1.0 (Apache-1.0)
+ License :: OSI Approved :: Apache Software License 2.0 (Apache-2.0)

Apple

- License :: OSI Approved :: Apple Public Source License
+ License :: OSI Approved :: Apple Public Source License 1.0 (APSL-1.0)
+ License :: OSI Approved :: Apple Public Source License 1.1 (APSL-1.1)
+ License :: OSI Approved :: Apple Public Source License 1.2 (APSL-1.2)
+ License :: OSI Approved :: Apple Public Source License 2.0 (APSL-2.0)

Artistic

- License :: OSI Approved :: Artistic License
+ License :: OSI Approved :: Artistic License 1.0 (Artistic-1.0)
+ License :: OSI Approved :: Artistic License 2.0 (Artistic-2.0)

BSD

- License :: OSI Approved :: BSD License
+ License :: OSI Approved :: BSD 2-Clause Plus Patent License (BSD-2-Clause-Patent)
+ License :: OSI Approved :: BSD 2-Clause "Simplified" License (BSD-2-Clause)
+ License :: OSI Approved :: BSD 3-Clause "New" or "Revised" License (BSD-3-Clause)

GNU

- License :: OSI Approved :: GNU Affero General Public License v3
+ License :: OSI Approved :: GNU Affero General Public License v3.0 only (AGPL-3.0-only)
- License :: OSI Approved :: GNU Affero General Public License v3 or later (AGPLv3+)  
+ License :: OSI Approved :: GNU Affero General Public License v3.0 or later (AGPL-3.0-or-later)
- License :: OSI Approved :: GNU Free Documentation License (FDL)
+ License :: OSI Approved :: GNU Free Documentation License v1.1 only (GFDL-1.1-only)
+ License :: OSI Approved :: GNU Free Documentation License v1.1 or later (GFDL-1.1-or-later)
+ License :: OSI Approved :: GNU Free Documentation License v1.2 only (GFDL-1.2-only)
+ License :: OSI Approved :: GNU Free Documentation License v1.2 or later (GFDL-1.2-or-later)
+ License :: OSI Approved :: GNU Free Documentation License v1.3 only (GFDL-1.3-only)
+ License :: OSI Approved :: GNU Free Documentation License v1.3 or later (GFDL-1.3-or-later)
- License :: OSI Approved :: GNU General Public License (GPL)
+ License :: GNU General Public License v1.0 only (GPL-1.0-only)
+ License :: GNU General Public License v1.0 or later (GPL-1.0-or-later)
- License :: OSI Approved :: GNU General Public License v2 (GPLv2)
+ License :: OSI Approved :: GNU General Public License v2.0 only (GPL-2.0-only)
- License :: OSI Approved :: GNU General Public License v2 or later (GPLv2+)
+ License :: OSI Approved :: GNU General Public License v2.0 or later (GPL-2.0-or-later)
- License :: OSI Approved :: GNU General Public License v3 (GPLv3)
+ License :: OSI Approved :: GNU General Public License v3.0 only (GPL-3.0-only)
- License :: OSI Approved :: GNU General Public License v3 or later (GPLv3+)  
+ License :: OSI Approved :: GNU General Public License v3.0 or later (GPL-3.0-or-later)
- License :: OSI Approved :: GNU Lesser General Public License v2 (LGPLv2)
+ License :: OSI Approved :: GNU Library General Public License v2 only (LGPL-2.0-only)
- License :: OSI Approved :: GNU Lesser General Public License v2 or later (LGPLv2+)
+ License :: OSI Approved :: GNU Library General Public License v2 or later (LGPL-2.0-or-later)
+ License :: OSI Approved :: GNU Lesser General Public License v2.1 only (LGPL-2.1-only)
+ License :: OSI Approved :: GNU Lesser General Public License v2.1 or later (LGPL-2.1-or-later)
- License :: OSI Approved :: GNU Lesser General Public License v3 (LGPLv3)
+ License :: OSI Approved :: GNU Lesser General Public License v3.0 only (LGPL-3.0-only)
- License :: OSI Approved :: GNU Lesser General Public License v3 or later (LGPLv3+)
+ License :: OSI Approved :: GNU Lesser General Public License v3.0 or later (LGPL-3.0-or-later)
- License :: OSI Approved :: GNU Library or Lesser General Public License (LGPL)

The deprecated classifiers will affect a lot of projects in some cases:

Classifier # of projects
Academic Free License (AFL) 40
Apache Software License 8176
Apple Public Source License 4
Artistic License 46
BSD License >10000
GNU Affero General Public License v3 3859
GNU Affero General Public License v3 or later (AGPLv3+) 524
GNU Free Documentation License (FDL) 3
GNU General Public License (GPL) 3794
GNU General Public License v2 (GPLv2) 1152
GNU General Public License v2 or later (GPLv2+) 408
GNU General Public License v3 (GPLv3) 3124
GNU General Public License v3 or later (GPLv3+) 2081
GNU Lesser General Public License v2 (LGPLv2) 128
GNU Lesser General Public License v2 or later (LGPLv2+) 175
GNU Lesser General Public License v3 (LGPLv3) 831
GNU Lesser General Public License v3 or later (LGPLv3+) 502
GNU Library or Lesser General Public License (LGPL) 1194

@Steap, can you review?

@dstufft @ewdurbin Can you sanity-check? Currently a user attempting to publish a release with a deprecated classifier will get an error like:

HTTPError: 400 Client Error: Invalid value for classifiers. Error: Classifier
'Topic :: Communications :: Chat :: AOL Instant Messenger' has been deprecated,
see https://pypi.org/classifiers/ for a list of valid classifiers. for url:
http://upload.pypi.org/legacy/
dstufft commented 6 years ago

Ugh, requiring 10k+ projects to modify their setup.py is not great. I guess the thing at the heart of this issue is whether classifiers are designed to be representative of the specific license of a project or if they're intended to act as a lossy mechanism to indicate the family of license something is under.

Overall my big concern here is that I'm not sure that classifiers are good enough even with these changes, in which case we're forcing a lot of churn for little benefit.

di commented 6 years ago

I agree, although the number of affected projects which will actually publish a new release is definitely a small fraction of that 10K number (although without doing some querying, I'm not sure how much).

It seems to me that in the case of the GNU licenses, the original classifiers are trying to be representative of a specific license, e.g. GNU General Public License v2 or later (GPLv2+) refers to a specific license, and according to https://spdx.org/licenses/ referring to this license by that name has been "deprecated".

I agree though that for the others, the classifier is trying to be more general, which only really poses a problem for the Apache license family, as not all licenses in this family are truly OSI-approved.

One other option would be to leave the "General" classifiers in place, and add more specific versions as sub-classifiers:

License :: OSI Approved :: BSD License
+ License :: OSI Approved :: BSD License :: 2-Clause Plus Patent License (BSD-2-Clause-Patent)
+ License :: OSI Approved :: BSD License :: 2-Clause "Simplified" License (BSD-2-Clause)
+ License :: OSI Approved :: BSD License :: 3-Clause "New" or "Revised" License (BSD-3-Clause)

I'd think we'd still need to do some depredations of the more specific GNU classifiers to make this work though, but the impact would be significantly less.

Thoughts?

stain commented 6 years ago

Well the whole point to deprecate is to force project pushing new updates to actually declare which version they are using.

GPL is a special case here as the changes here as they already have versions, but just attempts to align with spdx identifiers, which I think is a good thing; but perhaps not as critical as the other ones like the ambiguous "Apache License" or "BSD License" which may or may not be OSI Approved depending on what the author meant.

tieguy commented 6 years ago

👋 We've just run into this, so throwing in my two cents in case it helps prioritize/understand the problem.

tl;dr: it'd be nice to get this fix merged :)

We have two use cases for this data:

In particular, we just ran into a situation where a package's actual source code is BSD-2-Clause (per GitHub's scanner and our own analysis) but PyPi only reports the ambiguous bsd. So we can't actually do a useful analysis from just the pypi metadata; we have to crack open the source to figure out which BSD is being specified and whether it matches the GitHub metadata correctly. (This is not a small problem; our research suggests something like 15-20% of pypi packages have licensing metadata that doesn't match what GitHub reports; having found this bug I suspect that this problem drives a lot of that number.)

We can of course go into the source and figure this out, but it'd be nice if our customers (and presumably anyone else who uses pypi) can actually figure out what license they're required to use/distribute from the pypi metadata instead of having to dig into the code itself.

(Disclaimer: IAAL and I was a programmer, but I'm not your lawyer and no longer usefully a programmer ;)

di commented 6 years ago

Revisiting this, I don't think the "subclassifier" approach I mentioned in https://github.com/pypa/warehouse/issues/2996#issuecomment-385450514 will work, as it wouldn't let us eventually deprecate the "parent" classifier.

I think the right thing to do here is what I outlined in https://github.com/pypa/warehouse/issues/2996#issuecomment-385027197. We can reduce friction a little bit by adding the ability to tell users which new classifiers they should use instead, see https://github.com/pypa/warehouse/issues/4626.

brainwane commented 6 years ago

@ewdurbin and @Steap ping for your thoughts?

I'd love less license ambiguity in PyPI (for use in Libraries.io & similar projects) so I would appreciate if we could move forward on this change. But I recognize it might be a multi-step process, kind of like pypa/warehouse#3632 was for improving the quality of our email address verification (data model infrastructure-building, announcing on the announce list, etc.).

@tieguy am I right in presuming that you care more about analyzing license data from the most recent versions of packages than about archival/past releases? If so we might ask maintainers to make license-only point releases to fix this metadata issue. (Unless I am misunderstanding.)

tieguy commented 6 years ago

@brainwane yeah, for our use case we're primarily concerned with the latest version. So I think for our purposes something that allowed people to fix it in future releases, rather than doing mass-changes of old materials, would be sufficient.

I'm not Python's lawyer (you have Van for that ;) but happy to help with any explanatory work or other legal thinking where I can.

dstufft commented 6 years ago

Part of me just wants to remove the license classifiers, and add a metadata field for SPDX version specifier, which seems to be more generally useful?

di commented 6 years ago

@dstufft One nice thing about the current license classifiers is that it's easy to search for and sort by them. We'd need to add a way to do this for the "SPDX identifier" field as well.

Steap commented 6 years ago

Sorry for my late answer, real life got in the way :-/

@brainwane I'm also willing to remove as much ambiguity as possible in the classifiers. I like the big patch in #2996 but, as others have already stated, the issue is the "transition" to these new classifiers. Juste like @tieguy I'm also only interested in the latest version of a given package.

@dstufft Every language has their own code for licenses, and every GNU/Linux or *BSD distribution too. It drives me crazy that not everyone uses spdx identifiers, which seem to be a truly unique way of identifying a license. I'm afraid that it would be a bit late to switch to spdx identifiers, though.

dstufft commented 6 years ago

Sure, the flip side is that the current situation really only works in simple cases. For example, if you have the following two classifiers:

License :: OSI Approved :: Apache Software License 2.0 (Apache-2.0)
License :: OSI Approved :: BSD 2-Clause "Simplified" License (BSD-2-Clause)

Can I integrate this work into a GPL-2.0-only licensed software? You can't actually tell, because it depends on whether the software is licensed under Apache-2.0 AND BSD-2-Clause or if it is licensed under Apache-20 OR BSD-2-Clause.

Assuming you agree with the opinion that GPLv2 and Apache 2.0 are incompatible, if you have to comply with both the Apache-2.0 and the BSD-2-Clause license, then you cannot incorporate that work into a GPLv2 code base. This is why SPDX License Expressions have the ability to specify AND and OR explicitly. There's also the question of exceptions that we don't currently handle at all.

tieguy commented 5 years ago

Relevant PEP: https://github.com/pypa/interoperability-peps/issues/46

SamuelMarks commented 5 years ago

What's the status of this?

I've got about 60 Python packages that I want to release under (MIT OR Apache-2.0).

di commented 5 years ago

@SamuelMarks This issue is about the Classifier field, which will likely not support such fine-grained classifiers.

You should use the License field for that instead, e.g. in your setup.py:

setup(
    ...
    license="(MIT OR Apache-2.0)",
    ...
)
Steap commented 5 years ago

@di The documentation states that:

«The license field is a text indicating the license covering the package where the license is not a selection from the “License” Trove classifiers.»

The documentation you quoted is similar, and it shows the issue of having a "free format" for the license.

Should it be updated to specify that:

1) In simple cases, when possible, you should use a "License ::" classifier 2) Otherwise, you should use the license "field" with a valid SPDX expression?

Is there an official statement from PyPA regarding spdx identifiers/expressions? In the future, could PyPI check that the license is a valid expression when a maintainer uploads a package?

di commented 5 years ago

That's basically what it already says, with the exception of:

with a valid SPDX expression

the License field is a free-form field to allow anyone to license their project under any license. I don't think we have any plans to start enforcing any kind of semantics there.

However, Donald said:

Part of me just wants to remove the license classifiers, and add a metadata field for SPDX version specifier, which seems to be more generally useful?

This would be a separate hypothetical field that could be enforced.

Steap commented 5 years ago

However, Donald said:

Part of me just wants to remove the license classifiers, and add a metadata field for SPDX version specifier, which seems to be more generally useful?

This would be a separate hypothetical field that could be enforced.

How would one start a discussion about this? As the author of a tool that makes heavy use of the metadata found on pypi.org, and having worked with distributions that had little manpower (and therefore could really take advantage of non-ambiguous metadata), I would really like to see that. Should I write a PEP, write a message on a mailing-list, reach out to someone in particular?

di commented 5 years ago

PEP 566 changed the canonical source for field specifications to the Core Metadata Specification. In theory you could just make a PR against pypa/packaging.python.org to introduce the new metadata field (and, new metadata version).

This is wrong, we still need a PEP.

brainwane commented 5 years ago

@di Could you help me understand what it would take to resolve this issue? I got a little confused.

Tell me where I'm right/wrong?

pombredanne commented 5 years ago

FWIW, I had actually started working (or rather slacking) on a PEP to replace or add SPDX expressions to Python packages metadata to convey clearer, simpler and better license info a few years ago but never finished that thing. See https://github.com/pombredanne/spdx-pypi-pep/issues/1

@brainwane we cannot just add another set of classifiers for that IMHO. As @dstufft pointed in https://github.com/pypa/warehouse/issues/2996#issuecomment-425762903 you cannot handle anything but simple cases with a list of licenses. You need expressions for that. FWIW, I maintain a small library to deal with expressions if we ever come down to using this and need some validation https://github.com/nexB/license-expression/ ... But I guess there is a bit more discussion needed first!

@dstufft Do you reckon we would need a PEP to get this done right?

Who is game to help working on a PEP?

taleinat commented 5 years ago

@pombredanne, I would help working on such a PEP.

I'm not an expert on licenses, but I've dealt with them quite a bit; see this blog post.

di commented 5 years ago

@brainwane You're mostly right. Since License is currently a free-form field, I think we'd need to add a new field, something like SPDX-License-Identifier. This would require a new Metadata version, so anything writing or reading metadata would need updated.

We may want to:

@pombredanne and @taleinat, as I said in https://github.com/pypa/warehouse/issues/2996#issuecomment-499633046, a PEP is not necessary here.

pombredanne commented 5 years ago

@di excellent and much simpler! But beside an update to the Metadata spec and version, updates to several tools would be needed (wheel, setuptools, pip to name a few...) correct?

@taleinat let's start crafting something together then! We could meet/chat on #pypa-dev on Freenode. I am pombreda there

di commented 5 years ago

@pombredanne Yep, like I said above:

This would require a new Metadata version, so anything writing or reading metadata would need updated.

pombredanne commented 5 years ago

@taleinat @di @brainwane here is a starter https://github.com/pypa/packaging.python.org/pull/635

pombredanne commented 5 years ago

So I also started a PEP draft after all at https://github.com/python/peps/pull/1148 and a discussion on https://discuss.python.org/t/improving-license-clarity-with-better-package-metadata/2154

Steap commented 5 years ago

For what it's worth, I did a little experiment a few weeks ago and sent an email to 137 maintainers whose projects used the ambiguous "BSD" classifier. I explained why it was a bit difficult to figure out what license they were using, and asked them whether they would consider using an SPDX expression in the "license" field. So far, 33 of them uploaded a new version of their package with "BSD-2-Clause"/"BSD-3-Clause".

pombredanne commented 5 years ago

@Steap you rock! Another data point (disclosure: I contribute to this project and maintain the underlying license scanner used there scancode-toolkit ) is at https://clearlydefined.io/stats

See the Declared License Breakdown section and click on the pypi tab: the declared license is typically what shows up in the package manifest. Of the 608,157 Pypi packages scanned for license there so far, over 33% have a license of "other" which means that a license is either missing, or a lesser known license that was not on the SPDX list. The bare "BSD" classifiers would come as "other" as not clear.

mschwager commented 3 years ago

Has there been any progress here? I'm still waiting on LGPLv2.1.

PuneetGopinath commented 3 years ago

Has there been any progress here? I'm still waiting on LGPLv2.1.

I'm also still waiting on LGPLv2.1!!

UPDATE: I opened a pr for those who are waiting to update LGPL license, See #69