python / importlib_metadata

Library to access metadata for Python packages
https://importlib-metadata.readthedocs.io
Apache License 2.0
123 stars 80 forks source link

Presence of 'egg-info' metadata directory + use of virtualenv results in duplicate entry points #410

Closed stephenfin closed 1 year ago

stephenfin commented 1 year ago

(I'll preface this by saying that I think this might be similar to #101. It also feels like something that must have been reported elsewhere but my search-fu isn't sufficient to find those previous report(s))

If a {package}.egg-info directory exists and you're using a virtualenv, the importlib_metadata.distributions() function returns a duplicate distribution. Since this function is used by importlib_metadata.entry_points(), you'll also see a duplicate set of entrypoints. For example, consider the following using the simplest package I could find:

❯ cd /tmp
❯ git clone https://github.com/stephenfin/clouds2env
❯ cd clouds2env
❯ virtualenv .venv
❯ source .venv/bin/activate
❯ pip install . importlib-metadata
import importlib_metadata

for dist in importlib_metadata.distributions():
    if not dist.entry_points:
        continue
    print(dist._path)
    for ep in dist.entry_points:
        print(f'\t{ep}')
    print('***')

This returns something like the following:

clouds2env.egg-info
        EntryPoint(name='clouds2env', value='clouds2env:main', group='console_scripts')
***
/tmp/clouds2env/.venv/lib/python3.10/site-packages/clouds2env-1.1.1.dev1+g6d0ecef.dist-info
        EntryPoint(name='clouds2env', value='clouds2env:main', group='console_scripts')
***
/tmp/clouds2env/.venv/lib/python3.10/site-packages/setuptools-62.3.2.dist-info
        ... {truncated} ...
***
/tmp/clouds2env/.venv/lib/python3.10/site-packages/wheel-0.37.1.dist-info
        ... {truncated} ...
***
/tmp/clouds2env/.venv/lib/python3.10/site-packages/pip-22.1.1.dist-info
        ... {truncated} ...

We're not installing anything in editable mode so I think the distributions are actually the same and the info from egg-info should be ignore in favour of the info from the dist-info directory. I also see attempts to deduplicate the returned entrypoints in importlib_metadata.entrypoints() but I haven't dived into why this isn't working (or whether preventing this issue is even the goal of that code).

Additional info

❯ virtualenv --version
virtualenv 20.13.4 from /usr/lib/python3.10/site-packages/virtualenv/__init__.py
❯ pip freeze
clouds2env @ file:///tmp/clouds2env
importlib-metadata==5.0.0
PyYAML==6.0
zipp==3.8.1
❯ cat /etc/system-release
Fedora release 36 (Thirty Six)
stephenfin commented 1 year ago

Actually, no. Looking at this more, I see that importlib_metadata.entry_points() is actually working as expected when run from the shell.

>>> import importlib_metadata
>>> importlib_metadata.entry_points().select(name='clouds2env')
(EntryPoint(name='clouds2env', value='clouds2env:main', group='console_scripts'),)

The failures I'm seeing elsewhere are when running tests via tox. tox must be doing something weird here. It does mean my reproducer above isn't actually reproducing anything. I'll close this while I try to figure out what's going on with tox.

stephenfin commented 1 year ago

Okay, as expected I was hitting an existing issue but it wasn't reported as an issue but rather a PR, #377. That PR was abandoned in favor of another fix, #379, which was released in v4.12.0 and backported to 4.11 as part of v4.11.4. I was seeing it on some environments (Ubuntu 20.04) and not others (Fedora 36) because setuptools (version 65.3.0) is vendoring importlib_metadata on the former and the vendored version doesn't contain this fix.

stephenfin commented 1 year ago

Some more info. The reason #379 is relevant is because it fixes the value returned by PathDistribution._name_from_stem. This value is used to generate the _normalized_name attribute. Without the fix in #379, the presence of a {package}.egg-info directory (note: no version number) would result in this function returning a normalized package name of {package}.egg for that distribution rather than {package} as intended. As I noted in comment 0, the call to importlib_metadata.distributions() returns two distributions and we attempt to deduplicate the list of packages when generating entry points in importlib_metadata.entry_points(). That de-duplication relies on the value of _normalized_name however, so because that was wrong, our whole de-duplication attempt failed and we end up with a double set of entry points.

Regarding how to fix this: it's tricky. I noted above that setuptools is vendoring importlib_metadata. However, on closer inspection I noticed that setuptools on my Fedora host is even older (v62.3.2) and therefore still missing this fix. What seems to be different though is the type of object importlib_metadat.distributions() returns in the two environments when using tox. On the Ubuntu 20.04 host, I see <class 'setuptools._vendor.importlib_metadata.PathDistribution'> objects being returned. On the Fedora 35 host, I see <class 'importlib_metadata.PathDistribution'> objects. I have no idea why, though I suspect it's something to do with "resolvers"? In any case, in stevedore we're opting to always de-duplicate the results from entry_points() rather than making this de-duplication dependent on the version of setuptools in use and whether and the type of objects returned from distributions() are from an older, vendored version of importlib_metadata or not.

jaraco commented 1 year ago

I believe the best fix is to use importlib_metadata>=4.11.4 on Python < 3.12. I think I'd intended to backport that fix in CPython versions allowing it, but it seems I didn't get to it and even Python 3.11.0 has the flawed implementation. Setuptools needs to refresh its vendored version.

shareefj commented 1 year ago

@jaraco @stephenfin

I've just found my way here after a Google search for what seems like the same original issue. In my case I'm installing an editable Python package which seems to result in duplicate entry points.

However, I'm still seeing this with the 6.0.0 release of importlib_metadata. I'm on Python 3.8.16. Should this still be an issue?

FFY00 commented 1 year ago

Can you show us the content of the metadata directory for your package?

shareefj commented 1 year ago

@FFY00 You mean the egg-info directory?

ll src/asic_common.egg-info/
total 32K
drwxrws--- 2 shareefj Genesee 4.0K Mar 16 13:05 .
drwxrws--- 4 shareefj Genesee 4.0K Mar 16 13:05 ..
-rw-rw---- 1 shareefj Genesee    1 Mar 16 13:05 dependency_links.txt
-rw-rw---- 1 shareefj Genesee 1.1K Mar 16 13:05 entry_points.txt
-rw-rw---- 1 shareefj Genesee  125 Mar 16 13:05 PKG-INFO
-rw-rw---- 1 shareefj Genesee  466 Mar 16 13:05 requires.txt
-rw-rw---- 1 shareefj Genesee 2.9K Mar 16 13:05 SOURCES.txt
-rw-rw---- 1 shareefj Genesee   12 Mar 16 13:05 top_level.txt
jaraco commented 1 year ago

The fix is definitely in place. Please do some more investigation on your environment or put together an (ideally minimal) reproducer and open a new issue.

Since this issue implicates "virtualenv", double-check the Setuptools version that's present in the virtualenv (or upgrade it).

shareefj commented 1 year ago

@jaraco OK, having dug deeper, my issue is caused by a package I'm requiring not using importlib_metadata but the standard package. So the solution here is to get them to use a release of importlib_metadata that includes this fix?

They seem to be currently specialising syntax based on the version of Python. I'm assuming that if they use this package then that issue also goes away? https://github.com/SystemRDL/PeakRDL/blob/main/src/peakrdl/plugins/entry_points.py