pypa / pip

The Python package installer
https://pip.pypa.io/
MIT License
9.36k stars 2.98k forks source link

Optimise logic for displaying packages at the end of `pip install` #12791

Closed morotti closed 4 days ago

morotti commented 6 days ago

PERF: 5% faster pip install. algorithmic bug at the end of pip install in the code to display installed packages. O(n^2) to enumerate installed packages due to accidental loop in loop.

get_distribution(package_name) does a loop over all installed packages. that is quite surprising and unexpected. it should not be used inside a loop.

I rewrote the loop to iterate with iter_all_distributions(). that's what was called under the hood.

by the way, this code to display installed packages is more complex than it ought to be. I think it was intentional to display the package name exactly as it was provided by the user.

on a pip install run that takes 11 seconds: the loop takes 0.735960 seconds on main branch. the loop takes 0.064672 seconds with this fix.

pradyunsg commented 6 days ago

Wheee! Nice find @morotti!

Could you add a news fragment (as described in the section linked from the details of the failure)? Specifically, it should be 12791.bugfix.rst, not bugfix.12791.rst.

(the approval above is subject to CI being happy, which it isn't yet due to the issue with the missing news fragment!)

ichard26 commented 6 days ago

I will note that most of the overhead of get_distribution on Python 3.11+ will be eliminated once https://github.com/pypa/pip/pull/12656 is landed. This makes sense either way as an algorithmic optimization though! Also, I strongly suspect that the percentage performance increase varies based on how many packages are already installed beforehand. I'd avoid mentioning a specific figure in the NEWS entry IMO.

morotti commented 6 days ago

Hello, I think I fixed everything but there is a hook left failing "- hook id: end-of-file-fixer" rejecting the news file. How is the news file supposed to be formatted?

pfmoore commented 6 days ago

Hello, I think I fixed everything but there is a hook left failing "- hook id: end-of-file-fixer" rejecting the news file.

From what I recall, that means you don't have a newline at the end of the file.

morotti commented 6 days ago

I've done a bit more testing with various types and amount of packages to install.

I am seeing more than 25% of the runtime of "pip install" is taken to do the final print statement when you reach a few hundreds packages to install. Obviously it's increasing O(n^2) with the number of packages.

It's amazing, I haven't seen a bug like this for years. That might genuinely make pip the third slowest package manager ever written, right behind easy_install that had a similar O(n^2) bug to process package with distinct versions and the windows update system in windows XP that was O(n^2) with the number of updates. :)

image

pradyunsg commented 6 days ago

I've retitled this PR to something that more clearly reflects the change made, rather than the effect of the change.

@morotti I'd like to ask you to read https://cbea.ms/git-commit/ and rephrase the commit message to follow the guidance provided there.

(for other maintainers: when merging this, please either squash merge this with the commit message modified to follow the typical format or rewrite the commit message via a push yourself before merging this!)

morotti commented 6 days ago

I've read the blog article and adjusted the commit message. I hope that suits you.

I notice the windows tests are failing, for reasons unrelated to this commit, and there is a format check still complaining about the news, I tried to adjust the formatting but I'm sorry to say I don't get it.

Can I offer to any maintainer to make the commit and news as you expect and merge? I really don't mind having or not having the commit under my account. I would just like for fixes to be merged and make the world a faster place. It is quite challenging to onboard to a new project and learn the hundreds of CI and small rules to complete a trivial PR.

pfmoore commented 6 days ago

As I said previously, you simply need to add a newline at the end of the news file. I'm afraid I don't know enough git to fix this PR in place, so it'll be simpler if you do it, sorry.

morotti commented 5 days ago

All green now, this can be merged.

ichard26 commented 4 days ago

Oh, and thank you for the PR! It's nice to see the low-hanging fruit for optimising pip be addressed :slightly_smiling_face:

ichard26 commented 4 days ago

It's amazing, I haven't seen a bug like this for years. That might genuinely make pip the third slowest package manager ever written, right behind easy_install that had a similar O(n^2) bug to process package with distinct versions and the windows update system in windows XP that was O(n^2) with the number of updates. :)

I'm sure you'll be amused to hear that the old logic was also quadratic and was similarly linearised ~6 years ago https://github.com/pypa/pip/commit/cbc21a5957d1be21d1ecb12af28c5d78a01843b1 :)

morotti commented 4 days ago

All adjusted as per the last comments. should be good to merge.

I'm not sure if you squash-merge or if you expect me to squash in the branch?