Open liorj-orca opened 3 months ago
Hello, thanks for letting us know.
Python licenses have definitely been a challenge for a long time and it is something we're hoping to improve. The way it works now, the only time we are confident enough about the license to actually return something specific is if there is something we can map to an SPDX license name in the License field of the wheel's METADATA (or sdists's PKG-INFO). We have tried using the classifiers in the past, but there's enough ambiguous license classifiers that it didn't appreciably improve coverage, although obviously it would have helped with this example.
What is interesting about the example you've provided is that it does include License-Expression and License-File, which are not yet part of the Core Metadata specification. I believe they come from PEP 639, definitely interesting to see them in the wild already. I don't think adding support for that is a big job and it probably makes sense tin this case to do it preemptively, so hopefully we can get on to that shortly seeing as the new fields seem to be mutually exclusive with the old License field.
I also ran into an empty license array for this component
https://api.deps.dev/v3alpha/systems/pypi/packages/packaging/versions/24.1.0
{
"versionKey": {
"system": "PYPI",
"name": "packaging",
"version": "24.1.0"
},
"purl": "pkg:pypi/packaging@24.1.0",
"publishedAt": "2024-06-09T23:19:21Z",
"isDefault": true,
"isDeprecated": false,
"licenses": [],
"licenseDetails": [],
FYI @PFCM @sarnesjo-google , there's an open issue in ClearlyDefined discussing their implementation and handling of classifiers https://github.com/clearlydefined/crawler/issues/523
Hi, we are noticing a lot of cases where packages are missing their licenses even though they can be found easily. one of the examples for python packages is 'pydantic':
{"versionKey":{"system":"PYPI", "name":"pydantic", "version":"2.7.4"}, "publishedAt":"2024-06-12T14:11:54Z", "isDefault":true, "licenses":[], "advisoryKeys":[], "links":[{"label":"SOURCE_REPO", "url":"https://github.com/pydantic/pydantic"}, {"label":"HOMEPAGE", "url":"https://github.com/pydantic/pydantic"}, {"label":"DOCUMENTATION", "url":"https://docs.pydantic.dev"}], "slsaProvenances":[], "registries":["https://pypi.org/simple"], "relatedProjects":[{"projectKey":{"id":"github.com/pydantic/pydantic"}, "relationProvenance":"UNVERIFIED_METADATA", "relationType":"SOURCE_REPO"}]
as you can see the licenses array is empty.License-Expression: MIT License-File: LICENSE Classifier: Development Status :: 5 - Production/Stable Classifier: Environment :: Console Classifier: Environment :: MacOS X Classifier: Framework :: Hypothesis Classifier: Framework :: Pydantic Classifier: Intended Audience :: Developers Classifier: Intended Audience :: Information Technology Classifier: Intended Audience :: System Administrators Classifier: License :: OSI Approved :: MIT License
is there any issue for resolving/verifying those metadatas?