pypa / setuptools

Official project repository for the Setuptools build system
https://pypi.org/project/setuptools/
MIT License
2.35k stars 1.15k forks source link

Download slowness #4259

Closed nishantvarma closed 2 months ago

nishantvarma commented 2 months ago

Edit: Same issue is discussed in https://github.com/buildout/buildout/issues/573. Seems like https://github.com/pypa/setuptools/pull/2108 resolves this issue. Keeping it closed as the current build time is manageable.

What's the problem this feature will solve?

Buildout takes time to download an egg even if the exact version (foo==1.0) is mentioned. This can be reproduced with:

File: test.cfg

[buildout]
parts = app
index= ${pkgserver:fullurl}

[app]
recipe = zc.recipe.egg
eggs = pkg

[versions]
pkg = 1.0.0

[pkgserver]
fullurl = http://${:username}:${:password}@${:hostname}
hostname = packages.org
username = admin
password = pass

$ buildout -c test.cfg

This is really noticeable when there are 1000+ eggs in the package server.

I nailed down (?) this issue to these lines:

https://github.com/pypa/setuptools/blob/v44.0.0/setuptools/package_index.py#L486-L499 https://github.com/pypa/setuptools/blob/v44.0.0/setuptools/package_index.py#L367-L369

We seem to process every item in the package server, but is it required when the exact version is known (most common use case in production)?

Describe the solution you'd like

Can we add a filter to remove irrelevant versions? It seems to speed up the download, but I am not sure if it can be enhanced -- or if there are better solutions:

    def filter_links(self, name=None):
        from fnmatch import fnmatch

        if self.requirement.specs:
            if len(self.requirement.specs) == 1:
                if len(self.requirement.specs[0]) == 2:
                    pkg = self.requirement.name
                    op = self.requirement.specs[0][0]
                    ver = self.requirement.specs[0][1]
                    if op == "==":
                        if name:
                            return (
                                name.endswith("%s.tar.gz" % ver)
                                or fnmatch(name, "*-%s-*.whl" % ver)
                                or fnmatch(name, "*-%s-*.egg" % ver)
                            )
                        else:
                            self.package_pages[pkg] = {
                                key: value for key, value in self.package_pages[pkg].items()
                                if key.endswith("%s.tar.gz" % ver)
                                or fnmatch(key, "*-%s-*.whl" % ver)
                                or fnmatch(key, "*-%s-*.egg" % ver)
                            }

This has to be used in the functions I mentioned above.

Alternative Solutions

No response

Additional context

Important: I tried this in v44.0. It takes appox 5 hours to download an egg which has approx 1k+ packages in the server. It's much faster in the current version; however, I still think it's a good feature to have.

Code of Conduct