pypa / setuptools

Official project repository for the Setuptools build system
https://pypi.org/project/setuptools/
MIT License
2.5k stars 1.19k forks source link

[BUG] performance after switching to nspektr / importlib_metadata as cratered #3418

Closed benoit-pierre closed 2 years ago

benoit-pierre commented 2 years ago

setuptools version

setuptools>=60.8.0

Python version

3.10.5

OS

Arch Linux

Additional environment information

No response

Description

It's slow...

Expected behavior

It's not slow.

How to Reproduce

▸ python -m venv venv
▸ . ./venv/bin/activate
▸ python --version
Python 3.10.5
▸ pip install -U pip wheel setuptools
Requirement already satisfied: pip in ./venv/lib/python3.10/site-packages (22.0.4)
Collecting pip
  Using cached pip-22.1.2-py3-none-any.whl (2.1 MB)
Collecting wheel
  Using cached wheel-0.37.1-py2.py3-none-any.whl (35 kB)
Collecting setuptools
  Using cached setuptools-62.6.0-py3-none-any.whl (1.2 MB)
Installing collected packages: wheel, setuptools, pip
  Attempting uninstall: pip
    Found existing installation: pip 22.0.4
    Uninstalling pip-22.0.4:
      Successfully uninstalled pip-22.0.4
Successfully installed pip-22.1.2 setuptools-62.6.0 wheel-0.37.1
▸ wget -q https://files.pythonhosted.org/packages/71/39/171f1c67cd00715f190ba0b100d606d440a28c93c7714febeca8b79af85e/six-1.16.0.tar.gz
▸ tar xf six-1.16.0.tar.gz
▸ cd six-1.16.0
▸ time python setup.py -q bdist_wheel
python setup.py -q bdist_wheel  3.71s user 0.05s system 98% cpu 3.809 total
▸ pip install 'setuptools<60.10.0'
Collecting setuptools<60.10.0
  Downloading setuptools-60.9.3-py3-none-any.whl (1.1 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.1/1.1 MB 17.1 MB/s eta 0:00:00
Installing collected packages: setuptools
  Attempting uninstall: setuptools
    Found existing installation: setuptools 62.6.0
    Uninstalling setuptools-62.6.0:
      Successfully uninstalled setuptools-62.6.0
Successfully installed setuptools-60.9.3
▸ time python setup.py -q bdist_wheel
python setup.py -q bdist_wheel  1.74s user 0.13s system 98% cpu 1.888 total
▸ pip install 'setuptools<60.8.0'
▸ time python setup.py -q bdist_wheel
python setup.py -q bdist_wheel  0.49s user 0.03s system 99% cpu 0.524 total

Output

abravalheri commented 2 years ago

Hi @benoit-pierre could this be related to https://github.com/pypa/setuptools/pull/3395?

benoit-pierre commented 2 years ago

It's the switch to importlib_metadata / nspektr, my experience trying to switch from pkg_resources, apart from all the (to be expected) issues due to loosing the accumulated experience of pkg_resources, is that performance is bad.

benoit-pierre commented 2 years ago

Not to mention more work compared to using pkg_resources.

abravalheri commented 2 years ago

In #3395, I identified that dist._install_dependencies (which is the function that calls nspektr) is the second biggest time spent when building packages. I am proposing in that PR a simple check that can avoid runing nspektr at all.

I am not very sure about importlib_metadata (it should be faster than pkg_resources right?).

benoit-pierre commented 2 years ago

I am not very sure about importlib_metadata (it should be faster than pkg_resources right?).

Only if you have a very limited number of calls to the API. And because importlib_metadata does not do half the stuff that's provided by pkg_resources, it's more work and slower... In pkg_resources, because most functions will hit the working set, all the work is already done, so it's slower for example if you don't need metadata access to most distributions, but if you do, then the working set will act as a cache, and it's faster.

benoit-pierre commented 2 years ago

Honestly, I don't even understand why _install_dependencies was added: I thought setuptools was out of the business of installing build requirements (fetch_build_egg is deprecated).

abravalheri commented 2 years ago

Thank you very much @benoit-pierre for the explanation. I think the majority of places importlib.metadata is currently being used (at least directly) in setuptools is to access the entry-points, so it should not cause that much of a problem...

Honestly, I don't even understand why _install_dependencies was added: I thought setuptools was out of the business of installing build requirements (fetch_build_egg is deprecated).

I guess backwards compatibility would be the answer here. But I think we can fairly skip it in most of the cases which would provide a significant performance boost (no need to analyse the working set...) 😅

benoit-pierre commented 2 years ago

But that's my point, it does not look like backward compatibility, more like a new feature. Assuming PEP 518 is in charge of ensuring the build requirements are installed, why is there a need to check each entry points dependencies? Even if PEP 518 support was not called, isn't the (deprecated) code path that handles setup_requires (calling fetch_build_egg) in charge of making sure that all requirements are indeed present?

abravalheri commented 2 years ago

pip (and by extension build) does not honour the "extras" part of the entry-points (module:obj [extras]).

When setup_requires was a thing, it would make sure that the extras was honoured when calling the entry-points (including distutils.commands). This would be a fundamental difference between setup_requires and PEP 517, and what motivates the effort for backward compatibility (although I am not 100% sure).

benoit-pierre commented 2 years ago

OK, so the old code would use ep.require(installer=self.fetch_build_egg). That's from 2005 (v0.6a0!), just before adding support for setup_requires in v0.6a1. So again, I'm not even sure it's needed.

abravalheri commented 2 years ago

I agree with you. I think it is a fair assumption that entry-points nowadays are not defining extras.

benoit-pierre commented 2 years ago

pip (and by extension build) does not honour the "extras" part of the entry-points (module:obj [extras]).

Because that's not pip's job. If you want to use an entry point that depend on an extra, you should make sure you're specifying the extra at install.

So in that case, you would make sure the corresponding setup_requires requirement has the necessary extra.

benoit-pierre commented 2 years ago

I agree with you. I think it is a fair assumption that entry-points nowadays are not defining extras.

That's not what I'm saying.

abravalheri commented 2 years ago

Sorry, let me correct myself: "I also think that entry-points nowadays are not defining extras".

Your line of thought makes absolute sense. However, we cannot deny that old setuptools code would fix missing extras, so indeed we can think about the dist._install_dependencies as backwards compatibility.

Do I think we can get away without it? Absolutely.

benoit-pierre commented 2 years ago

I use entry points with extras, there's nothing wrong with that.

I think the check for an entrypoint missing dependency is not strictly necessary, and can be dropped: it's the package developer responsibility to ensure that setup_requires is consistent.

It looks like we don't have a test for this behavior.

benoit-pierre commented 2 years ago

Here's a test:

 setuptools/tests/test_easy_install.py | 66 +++++++++++++++++++++++++++++++++++
 1 file changed, 66 insertions(+)

diff --git i/setuptools/tests/test_easy_install.py w/setuptools/tests/test_easy_install.py
index 246d634f2..286af292d 100644
--- i/setuptools/tests/test_easy_install.py
+++ w/setuptools/tests/test_easy_install.py
@@ -890,6 +890,72 @@ class TestSetupRequires:
                 monkeypatch.setenv(str('PIP_TIMEOUT'), str('0'))
                 run_setup(test_setup_py, [str('--version')])

+    def test_setup_requires_with_distutils_command_dep(self, monkeypatch):
+
+        with contexts.save_pkg_resources_state():
+            with contexts.tempdir() as temp_dir:
+                # Create source distribution for `extra_dep`.
+                make_sdist(os.path.join(temp_dir, 'extra_dep-1.0.tar.gz'), [
+                    ('setup.py',
+                     DALS("""
+                          import setuptools
+                          setuptools.setup(
+                              name='extra_dep',
+                              version='1.0',
+                              py_modules=['extra_dep'],
+                          )
+                          """)),
+                    ('setup.cfg', ''),
+                    ('extra_dep.py', ''),
+                ])
+                # Create source tree for `epdep`.
+                dep_pkg = os.path.join(temp_dir, 'epdep')
+                os.mkdir(dep_pkg)
+                path.build({
+                    'setup.py':
+                    DALS("""
+                          import setuptools
+                          setuptools.setup(
+                              name='dep', version='2.0',
+                              py_modules=['epcmd'],
+                              extras_require={'extra': ['extra_dep']},
+                              entry_points='''
+                                           [distutils.commands]
+                                           epcmd = epcmd:epcmd [extra]
+                                           ''',
+                          )
+                         """),
+                    'setup.cfg': '',
+                    'epcmd.py': DALS("""
+                                     from distutils.command.build_py import build_py
+
+                                     import extra_dep
+
+                                     class epcmd(build_py):
+                                         pass
+                                     """),
+                }, prefix=dep_pkg)
+                # "Install" dep.
+                run_setup(
+                    os.path.join(dep_pkg, 'setup.py'), [str('dist_info')])
+                working_set.add_entry(dep_pkg)
+                # Create source tree for test package.
+                test_pkg = os.path.join(temp_dir, 'test_pkg')
+                test_setup_py = os.path.join(test_pkg, 'setup.py')
+                os.mkdir(test_pkg)
+                with open(test_setup_py, 'w') as fp:
+                    fp.write(DALS(
+                        '''
+                        from setuptools import installer, setup
+                        setup(setup_requires='dep')
+                        '''))
+                # Check...
+                monkeypatch.setenv(str('PIP_FIND_LINKS'), str(temp_dir))
+                monkeypatch.setenv(str('PIP_NO_INDEX'), str('1'))
+                monkeypatch.setenv(str('PIP_RETRIES'), str('0'))
+                monkeypatch.setenv(str('PIP_TIMEOUT'), str('0'))
+                run_setup(test_setup_py, ['epcmd'])
+

 def make_trivial_sdist(dist_path, distname, version):
     """

Which does not pass in v62.6.0 because the code forgets to activate the egg:

 setuptools/dist.py | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git i/setuptools/dist.py w/setuptools/dist.py
index c1ad30080..79905ed89 100644
--- i/setuptools/dist.py
+++ w/setuptools/dist.py
@@ -926,9 +926,12 @@ class Distribution(_Distribution):
         Given an entry point, ensure that any declared extras for
         its distribution are installed.
         """
+        if not ep.extras:
+            return
         for req in nspektr.missing(ep):
             # fetch_build_egg expects pkg_resources.Requirement
-            self.fetch_build_egg(pkg_resources.Requirement(str(req)))
+            dist = self.fetch_build_egg(pkg_resources.Requirement(str(req)))
+            sys.path.insert(0, dist.location)

     def get_egg_cache_dir(self):
         egg_cache_dir = os.path.join(os.curdir, '.eggs')

Going back to v60.9.3, the code is also buggy:

 setuptools/dist.py | 13 +++++++++----
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git i/setuptools/dist.py w/setuptools/dist.py
index e825785e2..3a02a61cc 100644
--- i/setuptools/dist.py
+++ w/setuptools/dist.py
@@ -876,23 +876,28 @@ class Distribution(_Distribution):
         Given an entry point, ensure that any declared extras for
         its distribution are installed.
         """
+        if not ep.extras:
+            return
         reqs = {
             req
             for req in map(requirements.Requirement, always_iterable(ep.dist.requires))
-            for extra in ep.extras
-            if extra in req.extras
+            if req.marker is not None and any(
+                req.marker.evaluate({'extra': extra})
+                for extra in ep.extras
+            )
         }
         missing = itertools.filterfalse(self._is_installed, reqs)
         for req in missing:
             # fetch_build_egg expects pkg_resources.Requirement
-            self.fetch_build_egg(pkg_resources.Requirement(str(req)))
+            dist = self.fetch_build_egg(pkg_resources.Requirement(str(req)))
+            sys.path.insert(0, dist.location)

     def _is_installed(self, req):
         try:
             dist = metadata.distribution(req.name)
         except metadata.PackageNotFoundError:
             return False
-        found_ver = packaging.version.Version(dist.version())
+        found_ver = packaging.version.Version(dist.version)
         return found_ver in req.specifier

     def get_egg_cache_dir(self):

v60.7.1 works as expected. But again, I think the onus should be on the package developer to ensure setup_requires is consistent. The example should be amended:

-                        setup(setup_requires='dep')
+                        setup(setup_requires='dep[extra]')

(which works as expected in v60.7.1, v60.9.3, and v62.6.0)

And _install_requires' code and corresponding calls can just be removed.

abravalheri commented 2 years ago

Thank you very much @benoit-pierre for the deep investigation. Since _install_requires does not seem to be having any effect, it could be removed right now... (if someone was relying on the feature, we would have noticed the issues right?).

benoit-pierre commented 2 years ago

Thank you very much @benoit-pierre for the deep investigation. Since _install_requires does not seem to be having any effect, it could be removed right now... (if someone was relying on the feature, we would have noticed the issues right?).

Agreed, PR here: #3421.

jaraco commented 2 years ago

I've previously been mostly unconcerned about runtime performance in Setuptools as most use-cases run in human time, so correctness and compatibility beats performance in most cases. But I agree that any degradation of more than 1s is noticeable and probably impactful in some cases.

Thanks everyone for the investigation. I haven't delved deep into it, and I'm happy to move forward with simplifying the codebase.