pypa / pip

The Python package installer
https://pip.pypa.io/
MIT License
9.49k stars 3.01k forks source link

Install packages in parallel #12742

Open notatallshaw opened 4 months ago

notatallshaw commented 4 months ago

What's the problem this feature will solve?

This is to improve the performance of pip.

For example looking at https://github.com/pypa/pip/issues/12613#issuecomment-2099108269 of a large install, even with resolving, downloading, buildising sdists, installing takes over 8% of the time. As resolving becomes faster, downloads are run in parallel, and hopefully there are more wheels instead of sdists then installing will become a larger part of the total time.

Describe the solution you'd like

After the resolve, downloads, and sdist build has completed, the installs could run in parallel.

Alternative Solutions

Keep as is.

Additional context

This would require a PR from someone obviously, I think there would need to make sure there are a complement of tests about installing packages in parallel, and different packages (e.g. make sure multiple editables run at the same time, editables and regular installs, etc.).

uv has already implemented this succesfully, following their issue tracker this has been the last problematic part of making things parallel/concurrent.

Code of Conduct

ichard26 commented 4 months ago

See also https://github.com/pypa/pip/issues/8187#issuecomment-857251595.

notatallshaw commented 4 months ago

See also #8187 (comment).

Thanks, hadn't seen that before, I'll have a good read through and see if this is a straight up duplicate, and if anything can be done to take the existing work to be landed in pip.

pfmoore commented 4 months ago

As the author of the linked comment, I'll add that the key new development is that uv has implemented parallel installs. It would be interesting to know how they designed things. It's quite possible that pip could learn some useful lessons.

I've not looked at how uv implements this at all, so the following is pure speculation, but if I had to guess, I'd imagine they have the following things in their favour:

  1. They may well have designed from the start for parallel tasks. One concern I have for pip is getting reporting right, for instance, because we have[^1] some stateful code that handles getting indentation correct, that might be broken by multiple threads.
  2. Rust has better thread safety than Python, so there's likely a class of issues that uv simply can't encounter (at least, not by accident).
  3. To be blunt, they may just not have worried about pathological cases. For example, installing two wheels in parallel, which both contain the same filename but with different content, is a potential race condition (writing the file itself and RECORD). But it's unlikely in practice, so maybe uv ignored the possibility. Pip has a larger user base, and a longer history of dealing with weird errors, so we may well simply be (for better or worse) more paranoid over things like this.

[^1]: Or at least we used to, I haven't looked at that code since we started using rich...

denx20 commented 2 months ago

Just curious, are there any updates on this/any active work being done?

morotti commented 2 months ago

I made a PR with a proof of concept https://github.com/pypa/pip/pull/12816

The parallel installation is trivial to do (except if you want to handle the case of 2 packages trying to overwrite the same file, outside of Linux)

The gains are very little because of the global interpreter lock. Unless you're installing on a very slow file system like a $HOME network drive.