google / deps.dev

Resources for the deps.dev API
https://deps.dev
Apache License 2.0
242 stars 18 forks source link

Support PEP 639 for PyPI licenses #94

Open liorj-orca opened 3 months ago

liorj-orca commented 3 months ago

Hi, we are noticing a lot of cases where packages are missing their licenses even though they can be found easily. one of the examples for python packages is 'pydantic':

  1. both on https://pypi.org/project/pydantic/ and https://github.com/pydantic/pydantic you can see they are using MIT license.
  2. when calling ''https://api.deps.dev/v3/systems/pypi/packages/pydantic/versions/2.7.4'' the response I'm getting: {"versionKey":{"system":"PYPI", "name":"pydantic", "version":"2.7.4"}, "publishedAt":"2024-06-12T14:11:54Z", "isDefault":true, "licenses":[], "advisoryKeys":[], "links":[{"label":"SOURCE_REPO", "url":"https://github.com/pydantic/pydantic"}, {"label":"HOMEPAGE", "url":"https://github.com/pydantic/pydantic"}, {"label":"DOCUMENTATION", "url":"https://docs.pydantic.dev"}], "slsaProvenances":[], "registries":["https://pypi.org/simple"], "relatedProjects":[{"projectKey":{"id":"github.com/pydantic/pydantic"}, "relationProvenance":"UNVERIFIED_METADATA", "relationType":"SOURCE_REPO"}] as you can see the licenses array is empty.
  3. on https://deps.dev/pypi/pydantic you can't find the license but there is some indication for dependencies licenses.
  4. According to https://docs.deps.dev/faq/#how-are-licenses-determined - the licenses are determined from the package metadata - if you look at the pydantic package metadata you would find also: License-Expression: MIT License-File: LICENSE Classifier: Development Status :: 5 - Production/Stable Classifier: Environment :: Console Classifier: Environment :: MacOS X Classifier: Framework :: Hypothesis Classifier: Framework :: Pydantic Classifier: Intended Audience :: Developers Classifier: Intended Audience :: Information Technology Classifier: Intended Audience :: System Administrators Classifier: License :: OSI Approved :: MIT License

is there any issue for resolving/verifying those metadatas?

PFCM commented 3 months ago

Hello, thanks for letting us know.

Python licenses have definitely been a challenge for a long time and it is something we're hoping to improve. The way it works now, the only time we are confident enough about the license to actually return something specific is if there is something we can map to an SPDX license name in the License field of the wheel's METADATA (or sdists's PKG-INFO). We have tried using the classifiers in the past, but there's enough ambiguous license classifiers that it didn't appreciably improve coverage, although obviously it would have helped with this example.

What is interesting about the example you've provided is that it does include License-Expression and License-File, which are not yet part of the Core Metadata specification. I believe they come from PEP 639, definitely interesting to see them in the wild already. I don't think adding support for that is a big job and it probably makes sense tin this case to do it preemptively, so hopefully we can get on to that shortly seeing as the new fields seem to be mutually exclusive with the old License field.

sgustafsson commented 2 months ago

I also ran into an empty license array for this component

https://api.deps.dev/v3alpha/systems/pypi/packages/packaging/versions/24.1.0

{
"versionKey": {
"system": "PYPI",
"name": "packaging",
"version": "24.1.0"
},
"purl": "pkg:pypi/packaging@24.1.0",
"publishedAt": "2024-06-09T23:19:21Z",
"isDefault": true,
"isDeprecated": false,
"licenses": [],
"licenseDetails": [],

FYI @PFCM @sarnesjo-google , there's an open issue in ClearlyDefined discussing their implementation and handling of classifiers https://github.com/clearlydefined/crawler/issues/523