raimon49 / pip-licenses

Dump the license list of packages installed with pip.
MIT License
307 stars 43 forks source link

Better handling of licenses of pep-0621 based libraries #134

Open TTMaZa opened 1 year ago

TTMaZa commented 1 year ago

Some libs (like scipy ) switched to a pyproject.toml-File according to PEP-0621.

This currently results in a strange situation. The licenses-Metadata no longer contains the short (possibly SPDX-conform) licenses Identifier but the full licenses text.

Example: https://pypi.org/project/scipy/1.9.3/ vs https://pypi.org/project/scipy/1.9.1/

I don't see an easy way to "fix" this with pip-licenses. But maybe someone here has an idea about how to handle this.

raimon49 commented 1 year ago

Thanks for the information.

I haven't followed the discussion on PEP-0621.

For example, the flit that scipy's pyproject.toml referred to is not full licenses text. What is the difference between the scipy and flit packaging process?

TTMaZa commented 1 year ago

Hi,

I've just started to dig into that as well.

flit discussed this in some detail here: https://github.com/pypa/flit/issues/377#issuecomment-728198300

What I can see is that the flit wheel does NOT contain a "License: "-entry in its METADATA-File. It only contains the Line "Classifier: License :: OSI Approved :: BSD License"

The scipy wheel on the other hand DOES include a 900 lines long "License: "-entry in its METADATA.

It could simply be the case, that both projects use different tooling for building the wheels.

This somehow matches to PEP-0612 which lists different tools like flit, poetry, setuptools und there handling of things.

https://peps.python.org/pep-0621/#license

b-kamphorst commented 1 year ago

Hi, I wanted to report the same issue that for me surfaced in the licence of formulaic. The maintainer reported back that the "issue" is in PEP 639 which is fully about improving licence clarity through metadata. Both the flit and hatch back-end adhere to this new standard.

So, in summary, I think the issue at hand is compliance with PEP 639. If you agree, I propose to update the issue title.

raimon49 commented 1 year ago

Thanks everyone.

Hmm, to output our list of licenses for packages using packaging tools that are supported ahead of PEP 639, do we need to go to the License-Expression field from the metadata?

TTMaZa commented 1 year ago

Making pip-license PEP-639 aware sounds pretty neat and future proof to me.

On the other hand I'm not sure how many of this libs that adopted PEP-0621 also adopted PEP 639 which is still in DRAFT state. I'm also not sure if PEP-639 happens automatically to libs that use flit or hatch.

But in any case, feel free to hijack this ticket to address the PEP-639 compliance if you like. Creating other tickets for different corner cases is still an option i guess.

Here some quick research results:

The formulaic lib @b-kamphorst mentioned does NOT contain a License-Expression field according to PEP-639 (https://peps.python.org/pep-0639/#add-license-expression-field) in it's wheel. All they did was to get rid of the old License field (https://peps.python.org/pep-0639/#deprecate-license-field) and adopt the new License-File field (https://peps.python.org/pep-0639/#add-license-file-field)

I currently see NO WAY of detecting, that formulaic is MIT-licenses from the metadata it's wheel provides. At least not other way then parsing and the file that is referenced with the License-File field. Maybe @b-kamphorst could go and discuss this with them in the according ticket.

Scipy as well does NOT contain a License-Expression field. In it's wheel. It does not even contain a Licenses-File field. It only contains the License field and the content of which has changed with the switch to PEP-0621.

This is all quite a mess right now. I guess there will be more things breaking. With people adopting PEP-0621 and randomly changing the contents of their License field.

Side note about that last idea: Maybe this is the moment to integrate the github's magic of reverese-detecting licenses by comparing licenses texts to well known licenses: https://docs.github.com/en/repositories/managing-your-repositorys-settings-and-features/customizing-your-repository/licensing-a-repository#detecting-a-license they even provide a REST-API for this: https://docs.github.com/en/rest/licenses