Include licenses of all Python packages

The inclusion of all licenses for the Python runtime components is very nice.

However, the licenses of the Python packages built into the binary is just as important - possibly even more so because the licenses were often chosen assuming the package would not be linked/embedded into a larger work, and there is less appreciation of those aspects of license chooses in the Python world because the source is normally the executable/redistributable.

Where PKG-INFO or EGG-INFO exists, which should be most of the time, there is a License free text field which looks like License: MIT. Sometimes it contains SPDX compatible names, other times it is ambiguous like License: BSD. The Trove classifiers are also ambiguous, e.g. License :: OSI Approved :: BSD License. And I have seen quite a lot of cases where a package has discrepancies between the trove classifiers, License: and LICENSE.txt. looks like it is quite useful to sort out that mess.

The wheels usually now contain a LICENSE file. It can be explicit with the following in setup.cfg , however modern setuptools now aggressively finds and includes one if it exists when building a wheel (and possibly also when building an sdist).

license_file = LICENSE.txt

IMO a good solution would be to try to get the license text file out of the wheel or sdist, and error/warn bitterly if one was not locatable, rather than playing games with the PKG-INFO/EGG-INFO text field, which is still insufficient if the license any that requires redist of the license text including custom notices, such as Apache-2.0.

IMO it is easy to get Python projects to add a LICENSE file, even when the project is otherwise moribund. Getting a new package release with the LICENSE might be more difficult, but often in that case a git+https:// or requirement solves the problem.(I assume those can be used with PyOzidizer)

The filtering on licenses should also apply to Python packages, and probably more errors/bitter warnings needed there. It would also be wise to have a default filter in place for GPL, so PyOzidizer errors unless users explicitly allow GPL packages in the project toml file.

jayvdb commented 5 years ago

A useful primer is

A fairly significant problem is likely to exist in most projects, as chardet is LGPL. It is included in the top 20 downloaded Python packages. is an open issue about which LGPL is in play there. and also discuss it. has many people explaining that requests (APL) should not be bundling chardet inside it, but it was resolved because requests dropped its bundling. The comments about using with pyinstaller are still relevant and unresolved. is another discussion about chardet, but again the assumption that packages are dynamically linked means LGPL's problems are not a priority.

Over at pyinstaller, they seem to be unaware of this issue also

Being able to have LGPL packages loaded dynamically is probably going to be necessary.

On my site-packages

> pip-licenses | grep GPL | grep -v LGPL
psycopg2 and paramiko are also very prominent packages that are LGPL.

Unidecode looks to be the GPL package most likely to effect many projects.

Thanks for all the detailed research! I'll need to look into matters further.

I generally agree that having license linting for Python packages (to match what we have for the C extensions and libraries) would be a good feature to have.

Maybe something like the pylicense tool can be helpful to gather the licenses of dependencies in the environment.

Related: #268

chardet is a big issue since it's used as a dependancy of requests, among other things.

charset_normalizer could be an alternative if packages migrated to it.