python / importlib_metadata

Library to access metadata for Python packages
https://importlib-metadata.readthedocs.io
Apache License 2.0
123 stars 80 forks source link

Reliable way of retrieving license files #441

Closed abravalheri closed 1 year ago

abravalheri commented 1 year ago

When I need to retrieve the license files from an installed project, I usually go for something like:

licenses = [f for f in importlib_metadata.files("<package>") if f.stem == "<pre-defined file name for 'package'>"]

However, I recently found out that some OSs will remove the RECORD file after installation, this means that files will return None...

With that in mind, I wonder if:

jaraco commented 1 year ago

It's my understanding that the Python Packaging Authority is working on a project to specify license details in a structured form, such as an SPDX entry in the metadata spec. That would be my preferred means of soliciting and advertising the license of a package.

The goal of importlib_metadata is to reflect the best model of what metadata is available for an installed package and to do that in a way that's true to the specifications, lenient to practical concerns, and flexible enough not to constrain non-standard environments (to support arbitrary loaders and finders similar to how Python does for imported modules). I do aim to avoid importlib_metadata creating de facto standards.

  • would it be possible for files() to least at least the contents of the .dist-info folder even if the RECORD file is deleted?

Yes, maybe. The implementation is already getting a little out of hand. The implementation currently returns the result of RECORD, installed-files.txt, or SOURCES.txt. I guess it could additionally fall back to attempting to enumerate files from the dist-info directory, but now there would be another hidden variant of the behavior (sometimes users would get the full file list and other times invisibly only get the metadata files). That all seems undesirable on the whole.

  • could we have a reliable retrieval mechanism for license files. Maybe it does not have to rely on files(), it could use the value of License-File in METADATA...

This approach sounds closer to viable. Oh! If License-File is defined in METADATA and the packaging spec indicates that the License-File can be found in the metadata directory, it should be possible to just read it/them:

 ~ $ pip-run setuptools -- -q
>>> import importlib.metadata as md
>>> dist = md.distribution('setuptools')
>>> dist.metadata.get_all('License-File')
['LICENSE']
>>> dist.read_text('LICENSE')[:10]
'Copyright '

Does that provide everything you need?

jaraco commented 1 year ago

@abravalheri Does that snippet not illustrate a way to satisfy the need of the reported issue?

abravalheri commented 1 year ago

Yes, thank you very much @jaraco. Sorry for the delay in replying.

Probably this solution will work independently from backend once the new PEP is approved.