pypa / pip

The Python package installer
https://pip.pypa.io/
MIT License
9.55k stars 3.04k forks source link

Speedup startup time #4768

Open pradyunsg opened 7 years ago

pradyunsg commented 7 years ago

There's a lot to gain from speeding up pip's startup time.

For one, pip takes around 600ms to just print the completion text, which is laggy. (as mentioned in #4755). Further, faster startup time might help with the test-suite situation too.

floam commented 7 years ago

I did notice one can save about 80ms by using the option to disable the version check - it'd probably make sense not to do that at all for completions by default. Still, a lot remains to be improved.

dstufft commented 7 years ago

I wonder how much of this time is taken up by importing stuff.

pradyunsg commented 7 years ago
$ time .tox/py36/bin/python -c "from pip._internal import main"

A rudimentary test of running the above 5 times gives me an average of 0.377s.

benoit-pierre commented 7 years ago

Importing pip._vendor.pkg_resources is what's taking most of the time on my machine. They are a few changes in setuptools 36.4 and above that will help a little: https://github.com/pypa/setuptools/blob/master/CHANGES.rst#v3640

pradyunsg commented 7 years ago

master is vendoring setuptools 36.4.0 currently...

https://github.com/pypa/pip/blob/4092707df38b62485776677707be93b49a51eb22/src/pip/_vendor/vendor.txt#L21

benoit-pierre commented 7 years ago

Yes, and it's faster than pip 9.0, but there are some further changes in setuptools 36.5 that might help: https://github.com/pypa/setuptools/blob/master/CHANGES.rst#v3650.

benoit-pierre commented 7 years ago
pradyunsg commented 7 years ago

Oh, nice. That means the next round of vendor updates would bring some speedup. :)

I, personally, am waiting on a new distlib release before giving the vendored libraries another round of updates.

pradyunsg commented 7 years ago

I fired up the profiler and ran pip completion --fish. Here's what I got (all percentages in terms of total time):

pradyunsg commented 7 years ago

PS: Need better profiling tools.

dstufft commented 7 years ago

Lazy importing will probably solve some of those.

brettcannon commented 7 years ago

There are plans to (hopefully) make lazy importing easy to switch on for CLI apps like pip in Python 3.7. There is also now a -X importtime argument to CPython 3.7 as well as dtrace/systemtap support to help track where import time is going to help profile this sort of thing.

boxed commented 7 years ago

@benoit-pierre

master+updated pkg_resources: python -c "from pip._vendor import pkg_resources" 0.19s user 0.00s system 99% cpu 0.192 total

What does "updated pkg_resources" mean? On my machine the initial import is still quite slow even though I have setuptools 38.1.0, so your comment seems very interesting to me! :P

benoit-pierre commented 7 years ago

@boxed: it means with pip's vendored version of setuptools updated.

boxed commented 7 years ago

Aha. I tried copying over pkg_resources from my main install over the one inside pip/_vendor, but I didn't see any difference in speed :/

pradyunsg commented 6 years ago

Using CPython 3.7.0's -X importtime.

import time: self [us] | cumulative | imported package
[snip]
import time:       654 |     506881 | pip._internal
[snip]
lorencarvalho commented 6 years ago

Just in case y'all are unaware, pkg_resources is tracking the slowness in https://github.com/pypa/setuptools/issues/510, I didn't see it linked in this issue yet.

CSDUMMI commented 5 years ago

Could you not move the imports from the top of the file to the function, that needs it?

boxed commented 5 years ago

@csdummi I have a PR that does this. It helps somewhat.

CSDUMMI commented 5 years ago

Could I have a link?

boxed commented 5 years ago

https://github.com/pypa/pip/pull/6346

cjerdonek commented 5 years ago

Here is a PR that improves the import situation for the vcs imports: https://github.com/pypa/pip/pull/6545 It removes the pip._internal.vcs imports from pip/internal/__init__.py. This will make it easy to remove vcs imports from the common case, if desired, which can be done in a subsequent commit.

CSDUMMI commented 5 years ago

Is an external CLI framework an option?

pradyunsg commented 5 years ago

@CSDUMMI What do you mean by external CLI framework?

boxed commented 5 years ago

@csdummi I'd say no, not at this time at least. But I do have some ideas for that at https://github.com/boxed/p but it's way too early for that.

CSDUMMI commented 5 years ago

@pradyunsg I did such a thing a few weeks ago: https://github.com/CSDUMMI/cli_framework, this is not optimized of course. But I think of something like this: https://oclif.io/

CSDUMMI commented 5 years ago

@boxed Why is it not yet appropriate?

boxed commented 5 years ago

@CSDUMMI I think both oclif and your code have the wrong idea, where git has the right idea (for this purpose at least). OCLIF is the more egregious to me when they say "it's open!" but then it's JS only :( My project is a general dispatcher/configger thingie that is language agnostic.

I don't think it's appropriate yet because there is no existing solid such system. We should absolutely build one, but we should build it separately and use it to wrap pip, pytest, venv, etc to make a nice python development workflow and when it's ready we can convince people to use it.

CSDUMMI commented 5 years ago

And how do you think should we build such a thing? What is the right idea git has?

CSDUMMI commented 5 years ago

@boxed I have to say your idea in p is very good, but your code needs more comments and documentation in general.

boxed commented 5 years ago

It needs to be used and polished most of all I think. But yea docs is also super important.

pradyunsg commented 5 years ago

@boxed @CSDUMMI Could you please move this conversation elsewhere? It's unrelated to this issue's topic.

CSDUMMI commented 5 years ago

But what is wrong with my project? What is the right idea git has?

CSDUMMI commented 5 years ago

@pradyunsg I think so too. That's why I want to know why we shouldn't move the implementation of the CLI optimization to another place, too!

boxed commented 5 years ago

Let's move the discussion to https://github.com/boxed/p/issues/4

cjerdonek commented 5 years ago

FYI, PR #6694 ("Only import a Command class when needed") was recently merged, which helps with this.

cjerdonek commented 5 years ago

I posted PR #6835 to help with this.

cjerdonek commented 5 years ago

I just posted PR #6843 to continue the work in PR #6835. The PR trims unneeded imports by making it so that commands not requiring downloading / PackageFinder will no longer import that machinery.

asottile commented 2 years ago

I noticed a pretty significant slowdown in the latest released version -- I've tracked it down to here -- might be worth bumping pyparsing once that gets resolved: https://github.com/pyparsing/pyparsing/issues/362

bluetech commented 7 months ago

I also looked into pip startup time a bit. First thing I noticed is tenacity's import of asyncio but seems like @ichard26 already took care of it (thanks!).

Another one I noticed is chardet import. requests supports either chardet or charset_normalizer. From a quick experiment I did replacing the chardet vendor import with non-vendored charset_normalizer import, I get ~27ms for chardet vs. ~7ms for charset_normalizer, using -X importtime on my admittedly ~10 years old laptop.

If there is interest I can try to prepare a PR to replace the charset vendor with charset_normalizer vendor.

pfmoore commented 7 months ago

Note that we can only vendor pure Python libraries. charset_normalizer would at the very least be tricky to vendor because we'd need to find a way to make our vendoring tools ignore the platform-specific wheels. Also, are your benchmarks using the pure Python version? If not, they would need to be re-done to be meaningful.

I don't have any strong opinions on whether we should switch, I'm just noting these points as things to consider if we do.

bluetech commented 7 months ago

Hmm the charset_normalizer github repo tagline says "in pure python" but I guess it's not :) I just tried again with charset_normalizer-3.3.2-py3-none-any.whl and still get ~7ms so the pure python version looks good as well.

notatallshaw commented 7 months ago

charset_normalizer was added as an optional dependency to requests when the apache-airflow team found the license for chardet wasn't suitable for them.

The developer did a lot to make it more acceptable to the requests maintainers, such as significantly reducing the amount of dependencies. My understanding is charset_normalizer only has binaries based on compiling pure Python code with mypyc, and isn't shipped by default.

If the developer is still as accommodating, I'd imagine pip would benefit in performance and ease of maintenance with charset_normalizer, but the first thing I would do is check with the developer that they are happy being vendored by pip.

bluetech commented 7 months ago

OK, submitted PR #12638.

BTW, another big import-time hit is packaging -> pyparsing package. I see that packaging has already replaced pyparsing with a hand-rolled parser, and there is a PR #12300 to update pip to use it. I checked the import time with #12300 and it's indeed a nice improvement.

With pyparsing, asyncio and chardet gone it will be a decent improvement to startup time. The remaining big ones are rich and requests/urllib3 but I don't think there is much to do about these for pip install.