Open notatallshaw opened 5 months ago
This scenario is artifically constructed to best profile the installer code by removing the need to download, build sdists, or resolve:
I ran with and without --dry-run
to see the timing difference:
Dry Run: 32s
Regular Install: 144s
I profiled with and without --dry-run
to see the profile difference:
There are some clear hotspots here, I will take a look when I have time if there are some easy ways to reduce those hotspots if no one else does.
The get_dist_name()
hot spot should be vastly improved by https://github.com/pypa/pip/pull/12656 FWIW. I scheduled the PR for 24.2 as it feels a bit risky to ship in 24.1 final. Please say something if anyone feels differently.
I see no issues with the UI proposal, but I'd want parallel installs to be a separate feature. I can imagine pathological cases where things could break when installing in parallel, and while the experience of uv
is encouraging (as is the fact that normal cases are clearly safe) my instinct is that every pathological case is being exercised by some user of pip, somewhere. So we should isolate the risk here by making it a separate feature.
The
get_dist_name()
hot spot should be vastly improved by #12656 FWIW. I scheduled the PR for 24.2 as it feels a bit risky to ship in 24.1 final. Please say something if anyone feels differently.
Great, I'll reprofile with this PR. I personally wasn't imagining any of these ideas would land for 24.1.
I'd want parallel installs to be a separate feature
Agree, I'll make a seperate issue for that.
Honestly, the others I feel like I could make PRs that safely improve pip, I'm unsure about parralel installs, I think it would at a minimum carefully need to look at what current multiple installs tests there are and potentially expanding them to have a good matrix of different possibilities.
Log a message that pip is starting to install packages
Btw, I was looking at this recently because I noticed pip does tell you it's installing packages. The specific scenario I was seeing was the following:
On step two this produces the following behavior:
The real world situaiton this happens is installing large machine learning packages, particularly because you install a bunch of packages from the pytorch index, and then install a bunch of packages from pypi.
I think there are a couple of possible solutions:
I will take a look at PRs when I have a chance.
Caching the result of utils.compatibility_tags.get_supported()
in the resolver factory should be another easy win[^1] (~3% or 4s in the example above)
I'll submit a PR when I get the chance.
[^1]: I strongly suspect that get_supported()
is only "slow" (as in, 1-5ms) on Linux due to the large amount of supported tags per system.
While taking a look at https://github.com/pypa/pip/pull/12601, I was curious to how easy it would be to add an installation progress bar. The progress bar was pretty trivial to add by extending the pre-existing progress logic... However, it did not play nicely with the logging stack, so any intervening logs would break the progress bar. To fix this, I had to redo how rich was initialized in the logging stack which took a bit :slightly_smiling_face:
Anyway, here's a demo:
Screencast from 2024-07-15 22-31-35.webm
What do you think @notatallshaw?
[^1]: Ideally, the presentation logic would simply disable the progress bar outright when writing to a non-TTY, but that's a future thing to think about.
Hmm, it would definitely look less rough if I left-justified the package name. Here's another demo, but the package name is justified to the longest name length seen so far (as doing it properly feels like going against the API contract of pips' progress bars).
Screencast from 2024-07-15 23-06-07.webm
It does kinda look weird. Perhaps after the bar?
Screencast from 2024-07-15 23-11-40.webm
I think this looks the best out of all of them :)
I'll note that your mental model for how pip installs packages is wrong. The uninstalls occur "on-demand" right before its replacement package is about to be installed (i.e. the uninstalls/installs are interwoven), so an uninstallation progress bar doesn't really make sense.
Ah, I see, my confusion is that is how pip's current logging displays what is happening. It logs all uninstalls, and then logs what packages it has installed, sometimes there can be a significant time between the last uninstall message and the install message, giving this impression.
I also chose to include the package currently being installed in the progress bar. Yes, in most situations, the per-package installation time is so low that most packages are never shown to the user (like in the demo), but there are exceptions. If we're installing some massive package, it'd be nice to let the user know we're stuck on $package. I don't feel strongly about this though so I'm fine dropping it.
I agree, if you want to try large packages installation where individual packages will be noticable you can do: pip install torch torchvision torchaudio
I think this looks the best out of all of them :)
Yes, I think anything left of the progress bar should be fixed width, and ideally not updating at all. At least in left to right English having the left hand side update feels like I need to keep rereading the whole line, but the right hand side updating just feels like I need to look at the right hand side to check updates.
Once you have a PR I'm happy to throw some difficult scenarios against it.
Okay, since I opened this issue there's been a lot of improvement to install performance of a lot of wheels, here is my synthetic test:
On pip 24.1.2:
real 2m23.338s user 2m8.488s sys 0m12.523s
On pip main (effectively 24.2 right now):
real 1m23.565s user 1m11.482s sys 0m9.681s
In this synthetic example ~50% of them time is now spent on O(n2+) issues in resolution and ~50% of the time is spent doing wheel specific stuff. It feels like both have algorithmic or caching opportunities. When I get a chance I will take a look.
In my synthetic test I notice ~30% of the time is spent on compile_file
. I notice that the standard library compile_dir
will create a process pool when it can to speed things up, I wonder if it makes sense to use compile_dir
on each root package directory installed, and then verify the pyc files are created as expected?
FYI, I beleive most other installers "optimize" this step by not compiling by default.
As discussed (https://github.com/pypa/pip/issues/12920) one helpful UX improvement would make it clear that Python is compiling byte code in pip's output when compiling is enabled, e.g. "Installing and Compiling".
Hmm, it would definitely look less rough if I left-justified the package name. Here's another demo, but the package name is justified to the longest name length seen so far (as doing it properly feels like going against the API contract of pips' progress bars).
Screencast.from.2024-07-15.23-06-07.webm It does kinda look weird. Perhaps after the bar?
Screencast.from.2024-07-15.23-11-40.webm I think this looks the best out of all of them :)
@ichard26 This looks pretty great. Did you ever create a PR? I would like to help if I can.
What's the problem this feature will solve?
At the moment when the final install step starts pip gives no output what it is doing, in some real world cases (e.g. large pytorch installations or airflow installs) this steps can take over 30 seconds on fast machines, so minutes on slow machines. The user is left wondering if anything is happening.
Describe the solution you'd like
I would like to see the following improvements:
Alternative Solutions
I think at a bare minimum there should be a log message that lets the user know what's happening.
Additional context
uv runs installs in parallel, and following their issue tracker it does not appear to be problematic, to do this a cli option to control the maximum number would need to be added, the same as how there is a PR for parallel downloads to do this.
Code of Conduct