pypa / setuptools

Official project repository for the Setuptools build system
https://pypi.org/project/setuptools/
MIT License
2.54k stars 1.2k forks source link

[ENH] Importlib Metadata shows two distributions with same name for editable installs #4170

Open Jacob-Stevens-Haas opened 11 months ago

Jacob-Stevens-Haas commented 11 months ago

setuptools version

setuptools 69.0.3

Python version

3.10.12

OS

Ubuntu

Additional environment information

Applies to both src/ and flat layout

Description

I was trying to identify editable packages installed in my current environment by looking at direct_url.json for a package given by importlib.metadata.distribution(name). It was showing that file didn't exist. Upon further investigation, importlib.metadata.distributions() had two entries for my package - one PathDistribution who's files contains dist-info in site-packages, and another PathDistribution who's files contain egg-info, built by setuptools in the local directory. distribution(name) only finds the local version. Interestingly, importlib.metadata.packages_distributions() shows that the distribution package foo has two import packages associated, both with the same names.

Expected behavior

I would've expected just one distribution package for an editable install, in this case with a single import package associated. At a lower level, I'm not sure it really makes sense to ever have two distributions of the same name installed, and therefore perhaps setuptools should have internally raised an error when distributions finds two of the same name or two import packages with the same name in the same distribution.

How to Reproduce

I've got an example distribution package, foo, with one import package, also named foo:

  1. clone and cd into repo at https://github.com/Jacob-Stevens-Haas/setuptools_test
  2. create and activate a virtual environment (I'm using venv)
  3. pip install -e .
  4. python show_dists.py

This will print the results of distributions(), showing two named "foo", the files in the two matching distributions, and then the packages_distributions() results.

  1. (optional) playing around with working directory or switching to an src layout (see src branch) has same result. pip freeze shows just a single distribution package

Output

'foo'
'foo'
'pip'
'setuptools'
[PackagePath('pyproject.toml'),
 PackagePath('foo/__init__.py'),
 PackagePath('foo.egg-info/PKG-INFO'),
 PackagePath('foo.egg-info/SOURCES.txt'),
 PackagePath('foo.egg-info/dependency_links.txt'),
 PackagePath('foo.egg-info/top_level.txt')]
[PackagePath('__editable__.foo-0.1.0.pth'),
 PackagePath('__editable___foo_0_1_0_finder.py'),
 PackagePath('__pycache__/__editable___foo_0_1_0_finder.cpython-310.pyc'),
 PackagePath('foo-0.1.0.dist-info/INSTALLER'),
 PackagePath('foo-0.1.0.dist-info/METADATA'),
 PackagePath('foo-0.1.0.dist-info/RECORD'),
 PackagePath('foo-0.1.0.dist-info/REQUESTED'),
 PackagePath('foo-0.1.0.dist-info/WHEEL'),
 PackagePath('foo-0.1.0.dist-info/direct_url.json'),
 PackagePath('foo-0.1.0.dist-info/top_level.txt')]
{'_distutils_hack': ['setuptools'],
 'debian': ['setuptools'],
 'foo': ['foo', 'foo'],
 'pip': ['pip'],
 'pkg_resources': ['setuptools'],
 'setuptools': ['setuptools']}
abravalheri commented 11 months ago

Hi @Jacob-Stevens-Haas, thank you very much for opening this discussion.

For the time being this is a limitation for the combo setuptools and importlib-metadata interoperating together...

The current design of setuptools requires the .egg-info folders as part of the building process and intentionally places them at the root of the repository for flat-layout projects.

We do have a milestone for removing the egg-info, https://github.com/pypa/setuptools/milestone/3, but I don't think that is a goal that can be achieved in the short term.

If this turns up to be problematic for you, please consider the following workarounds while the long-term implementation is not ready:

If any member of the community is interested in contributing towards the goal of removing the reliance on .egg-info directories, contributions are always welcomed.

Jacob-Stevens-Haas commented 11 months ago

Thanks for the quick reply! And yeah, that would be a fine workaround. Given that removing egg-info might take a while... would implementing the workaround (ignoring egg-info distributions if there's a same-name dist-info distribution) inside importlib.metadata be reasonable?

  • Consider using a src-layout (if I am not mistaken with src-layout, the .egg-info folder will be placed inside the src folder and then not picked up by importlib-metadata).

I have a branch in the above repo with an src layout, and starting everything from scratch with that layout gives similar results:

 'foo'
'pip'
'setuptools'
'foo'
[PackagePath('__editable__.foo-0.1.0.pth'),
 PackagePath('foo-0.1.0.dist-info/INSTALLER'),
 PackagePath('foo-0.1.0.dist-info/METADATA'),
 PackagePath('foo-0.1.0.dist-info/RECORD'),
 PackagePath('foo-0.1.0.dist-info/REQUESTED'),
 PackagePath('foo-0.1.0.dist-info/WHEEL'),
 PackagePath('foo-0.1.0.dist-info/direct_url.json'),
 PackagePath('foo-0.1.0.dist-info/top_level.txt')]
[PackagePath('pyproject.toml'),
 PackagePath('src/foo/__init__.py'),
 PackagePath('src/foo.egg-info/PKG-INFO'),
 PackagePath('src/foo.egg-info/SOURCES.txt'),
 PackagePath('src/foo.egg-info/dependency_links.txt'),
 PackagePath('src/foo.egg-info/top_level.txt')]
{'_distutils_hack': ['setuptools'],
 'debian': ['setuptools'],
 'foo': ['foo', 'foo'],
 'pip': ['pip'],
 'pkg_resources': ['setuptools'],
 'setuptools': ['setuptools']}

Is this because during python startup, importing site reads __editable__.foo-0.1.0.pth and adds the src directory to sys.path? Interestingly, this changes the order of the packages, as the egg-info is no longer found in "". It thus means that importlib.metadata.distribution("foo") finds the correct package... which is a win for me, but IDk if this behavior is reliable.

If any member of the community is interested in contributing towards the goal of removing the reliance on .egg-info directories, contributions are always welcomed.

I'd love to, but realistically I'll probably just learn more about setuptools and why removing reliance on egg-info is so daunting... the classic "know enough to be dangerous... but not to be useful" stage.

abravalheri commented 11 months ago

Thanks for the quick reply! And yeah, that would be a fine workaround. Given that removing egg-info might take a while... would implementing the workaround (ignoring egg-info distributions if there's a same-name dist-info distribution) inside importlib.metadata be reasonable?

That is something to be discussed in the importlib.metadata repo, but that would break setuptools 😅 (because the existing design relies on that).

I have a branch in the above repo with an src layout, and starting everything from scratch with that layout gives similar results Is this because during python startup, importing site reads __editable__.foo-0.1.0.pth and adds the src directory to sys.path?

I see... Yeap, that is correct. The src layout will add the src-directory as a new entry to sys.path, end then impotlib.metadata will catch it. That makes sense, sorry I didn't think about that.

Interestingly, this changes the order of the packages, as the egg-info is no longer found in "". It thus means that importlib.metadata.distribution("foo") finds the correct package... which is a win for me, but IDk if this behavior is reliable.

That is probably 90% reliable 😅. The "" directory (which corresponds to the current work dir) is added by default as the first entry in sys.path automatically depending on how you run a Python script, module or REPL. This is the reference (https://docs.python.org/3/using/cmdline.html):

-c <command> If this option is given, the first element of sys.argv will be "-c" and the current directory will be added to the start of sys.path (allowing modules in that directory to be imported as top level modules).

-m <module-name> ... As with the -c option, the current directory will be added to the start of sys.path.

<script> If the script name refers directly to a Python file, the directory containing that file is added to the start of sys.path, and the file is executed as the main module. If the script name refers to a directory or zipfile, the script name is added to the start of sys.path and the main.py file in that location is executed as the main module.

-I option can be used to run the script in isolated mode where sys.path contains neither the current directory nor the user’s site-packages directory. All PYTHON* environment variables are ignored, too.

And this is the reference for the .pth file mechanism we use for adding entries to sys.path in the editable install for the src-layout: https://docs.python.org/3/library/site.html.

Jacob-Stevens-Haas commented 11 months ago

Ah, thanks for all that! After a cursory reading, does setuptools create an editable install as a PEP660 editable wheel? Or does the presence of an egg-info directory locally imply otherwise?

Also, and this might not be the ideal solution, would it be possible to add direct_url.json to the egg-info directory?

abravalheri commented 11 months ago

Ah, thanks for all that! After a cursory reading, does setuptools create an editable install as a PEP660 editable wheel?

Ideally yes. But that will depend on how pip calls setuptools. pip has its own heuristics to decide when and how to call setuptools and in some edge cases it will rely on setuptools deprecated code paths.

Or does the presence of an egg-info directory locally imply otherwise?

The presence of the .egg-info directory is NOT a direct/unequivocal indicator of the installation method that was used. It may be found even when the process described in PEP 660 is employed.

would it be possible to add direct_url.json to the egg-info directory?

The direct_url.json file is a installer's thing. It is not something covered in the setuptools codebase/scope. Instead, pip is the tool producing it.

zjp commented 5 months ago

In my team's case I've found that as soon as I install a package in editable mode, I get duplicates of every package in my site-packages folder from importlib.metadata.distributions() even if I install that package the normal way again. We're tracking the issue here.