python / importlib_metadata

Library to access metadata for Python packages
https://importlib-metadata.readthedocs.io
Apache License 2.0
126 stars 81 forks source link

Raise if multiple `dist-info` folders for same dist exist #457

Closed layday closed 1 year ago

layday commented 1 year ago

I've recently had to debug an issue where uninstallation of an older version of a package failed or was interrupted and the older version's dist-info folder was not removed. The folder was empty, but it was still picked up by importlib_metadata in preference to the newer version's folder. importlib_metadata was used to retrieve the entry points from the distribution, and because entry_points suppresses IO errors, I had no indication of what might've been going wrong. I understand that changing the behaviour of entry_points would be problematic; but if multiple dist-info folders are found for the same distribution (name), it would be helpful if importlib_metadata would raise an error alerting the user that their installation is essentially corrupted.

jaraco commented 1 year ago

Unfortunately, multiple metadata folders is perfectly viable (in the same way that two executables of the same name can be on the same PATH or two copies of the same Python package can be found on sys.path).

Consider, for example, a situation where one has pip installed pytz<2022 and then separately used pip install -t temp pytz>2022 and then invoked python with PYTHONPATH=temp. In that environment, there are two metadata definitions for pytz, with the more recent one taking precedence as it's earlier on sys.path.

This situation can be readily simulated with:

 $ pip-run 'pytz<2022' -- -m pip-run 'pytz>2022' -- -c "import importlib.metadata as md; print(md.distribution('pytz').version)"
2023.3

Handling of entry_points is similar, except only unique distributions are considered.

The situation you're describing, however, seems to be about two manifestations of metadata for the same package on the same sys.path. In this case, it may be considered invalid (corrupted). It may be worthwhile to consider options to fail fast in such a situation. It may prove complicated or expensive to track duplicate distributions in the same sys.path, but in theory it should be possible.

First, I'd like consider - is this situation something that should be diagnosed by importlib metadata on every invocation, or should this function be provided as a separately-invoked operation that users are directed to use when experiencing difficulty (e.g. python -m importlib.metadata.diagnose) and let the default behavior assume there's no corruption present?

jaraco commented 1 year ago

I filed #461 to track creating a diagnose command. Feel free to follow up or re-open this issue if you'd like to advocate for doing more then helping users diagnose broken environments.

layday commented 1 year ago

Thank you. Please see https://github.com/pypa/build/issues/626 as well where another use struggled with a somewhat similar issue. I think a "diagnose" command would've helped immensely in that instance.