Open daenney opened 10 years ago
We ran into a related issue in pypa/warehouse#3225.
I haven’t seen a whole license file in the license keyword, and haven’t seen anything documenting or recommending that.
The classifier has a list of valid values and is documented (in PEPs at least) as the recommended option.
The license field is documented as optional and free form. It is redundant with a classifier but not exclusive (i.e. does not make the metadata invalid or cause tools to error out).
I haven’t seen a whole license file in the license keyword, and haven’t seen anything documenting or recommending that.
But, it's possible, which means users have done it (There's packages on PyPI for which this is true).
What method should be used and are any of them mutually exclusive. For example, if you have a license classifier, do you still need the
license
key?
I think the metadata specification describes this best:
Text indicating the license covering the distribution where the license is not a selection from the “License” Trove classifiers. See “Classifier” below. This field may also be used to specify a particular version of a licencse which is named via the Classifier field, or to indicate a variation or exception to such a license.
But, it's possible, which means users have done it (There's packages on PyPI for which this is true).
Curiously, this is actually a good idea in some sense... because most OSS licenses require that you include the text of the license when you redistribute. (E.g. MIT says: "Permission is hereby granted [to do anything you want with this code] subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.", emphasis mine.)
Currently there's no mechanism for including license files in wheels, so I think those weird packages that got confused about the license=
kwarg might be the only wheels on PyPI that are technically legal to redistribute. In practice no-one really enforces this but it'd still be good to fix eventually.
The current approaches to handling license metadata are also pretty limited – for example, there's no way to express licenses like pyca/cryptography's dual MIT/Apache2. It made sense at the time, but is probably overdue for an overhaul.
A further wrinkle is that when tools like auditwheel
get involved, we may need to algorithmically transform the license metadata to record information about vendored packages.
So at some point I think we should:
Add some standard place to store actual license text, perhaps $PKG.dist-info/license-text/
, with corresponding changes to setuptools/bdist_wheel to fill this in
For metadata, switch from the current trove classifiers approach to the SPDX license metadata standard.
Just chiming in quick:
I am involved in SPDX quite a bit and I maintain a decent Python-based license scanner (scancode-toolkit). I started extremely slowly working on a draft of a draft of a PEP idea related to this in https://github.com/pombredanne/spdx-pypi-pep/issues/1 about a year ago...
@njsmith re
Currently there's no mechanism for including license files in wheels, so I think those weird packages that got confused about the license= kwarg might be the only wheels on PyPI that are technically legal to redistribute. In practice no-one really enforces this but it'd still be good to fix eventually.
Actually there is a little known way to include one single license file in a wheel using setup.cfg
[metadata]
license_file = NOTICE
This would include a NOTICE file from the root in your wheel *.dist-info
dir as a file named LICENSE.txt
See for example: https://github.com/nexB/scancode-toolkit/blob/fd2e483e346a38ee9634538a0f05ca4dd96fb622/setup.cfg#L2 It is not well known or documented and license file are amiss more often than not
There's a related issue that might be good to tackle here: ensuring that packages can actually be distributed according to the specified licence. For example, if I want to start a project that rebuilds all of PyPI for a new architecture or distro, there's nothing legally protecting me from a packager that puts packages on PyPI with a misleading Licence tag. I am worried that from a legal perspective, the tag is just an arbitrary piece of free-form text: a place where you can put the license if you want.
Perhaps we need a checkbox on PyPI that says the License-Expression
field of packages I upload identifies the licences under which the package can be distributed, according to SPDX, and block packages that have License-Expression
from being uploaded to PyPI if the uploader didn't sign that. Or something like that.
I am not a lawyer (and I probably should consult one before downloading something from PyPI).
@encukou my 2 cents: you may be over worried here IMHO. Beside the metadata, there is nothing that would prevent a bad actor to act badly wrt. licensing in some ways other ways, so I am not sure that an extra checkbox would bring something of value.
@pombredanne We actually have wording in the PyPI ToS that serves to protect both the PSF and mirror operators when it comes to duplication and distribution of the uploaded packages.
However, the Terms of Service don't cover actually executing any of the code in the uploaded artifacts, not even the build scripts - that's all handled via the open source licenses that are applied to projects.
So I think it would be well worth asking @VanL (as the PSF's general counsel and an experienced open source IP lawyer) whether there's a potential opportunity here along the lines of what @encukou mentions.
@ncoghlan Thank you for taking the time to chime in! and this make sense (FWIW I met @VanL at the last FOSDEM/CopyleftConf and he mentioned even using some of my tools: yummy) ... @VanL what's your take here? Specifically on this item: https://github.com/pypa/packaging-problems/issues/41#issuecomment-521661383
Perhaps we need a checkbox on PyPI that says the License-Expression field of packages I upload identifies the licences under which the package can be distributed, according to SPDX, and block packages that have License-Expression from being uploaded to PyPI if the uploader didn't sign that. Or something like that.
@pombredanne I've used scancode-toolkit as well - it just didn't click why your name was familiar until you mentioned it. Thank you for that project!
@ncoghlan thank you for using scancode-toolkit :bowing_man: ... if there is anything that is not detected right, it's a bug so send it my way and you will get prime expedited delivery treatment :) It would not exist without Python and pypa!
Yes, PSF has very nice wording, but just for PyPI. It seems that nowadays the answer to “How to best support an exotic architecture/platform?” is “Rebuild all wheels from the sources and host them on your own”. That's not fantasy, it is how packages for Raspberry Pi are built and distributed today. Imagine if if you run such a project, someone uploads copyright-protected content but puts an open-source licence in the licence field. It would be nice if that project could say “but the uploader said here that this is OK to distribute under this licence! Of course I'll take the stuff down, but don't sue me.”
@encukou you wrote:
Imagine if if you run such a project, someone uploads copyright-protected content but puts an open-source licence in the licence field. It would be nice if that project could say “but the uploader said here that this is OK to distribute under this licence! Of course I'll take the stuff down, but don't sue me.”
This would be what linux distros do too, and the main ones such as Debian and Fedora do typically review and vet the license of the packages they build/package/distribute. In the case of PyPI there is no such: the authors or packagers uploaded something and PyPI is merely the distribution mechanism. For a project that would actively rebuild and repackage packages I think that would be up to them to provide any such CYA statement, what do you think?
FTR -- there's discussion happening related to this at https://discuss.python.org/t/improving-license-clarity-with-better-package-metadata/2154/.
There's quite a few way to add a license:
setup(license=)
setup(license=)
What method should be used and are any of them mutually exclusive. For example, if you have a license classifier, do you still need the
license
key?