pex-tool / pex

A tool for generating .pex (Python EXecutable) files, lock files and venvs.
https://docs.pex-tool.org/
Apache License 2.0
2.54k stars 258 forks source link

make use of pip install --report JSON output #2210

Open cosmicexplorer opened 1 year ago

cosmicexplorer commented 1 year ago

After quite a long saga (pypa/pip#53), pip has the --report=<out.json> option to pip install (see pypa/pip#10771). This can be combined with --ignore-installed and --dry-run to produce a resolve report specifically for the uses of tools like pex. There are some further changes in flight to make this metadata-only resolve significantly faster by avoiding any downloads at all (see pypa/pip#12186), and plans to get it down to almost instantaneous by caching metadata lookups (pypa/pip#12184). With the --use-feature=fast-deps option, these improvements also apply to resolves against wheels in a --find-links index or a pypi-like index that hasn't yet implemented PEP 658 (pypi itself has only just now enabled it).

One use case where this shines is lock file creation. A prototype I made incorporating a few of the mentioned in-progress changes exposes a function pex.resolver.resolve_new() to execute pip install --report, but with otherwise the same arguments as resolve(): https://github.com/pantsbuild/pex/compare/main...cosmicexplorer:pip-json-resolve?expand=1. Without any of the work from pypa/pip#12184, this halves the time pex spends within pip when creating a lockfile:

# Not the same output as a lock file, but PEX would process this json instead of needing to parse pip output.
> time python3.8 -c 'from pex.resolver import resolve_new; import json; print(json.dumps(list(resolve_new(requirements=["numpy>=1.19.5", "keras==2.4.3", "mtcnn", "pillow>=7.0.0", "bleach>=2.1.0", "tensorflow-gpu==2.5.3"]))[0]))'
...
real    0m8.769s
user    0m5.858s
sys     0m1.224s
> time python3.8 -m pex.cli lock create --resolver-version=pip-2020-resolver 'numpy>=1.19.5' 'keras==2.4.3' 'mtcnn' 'pillow>=7.0.0' 'bleach>=2.1.0' 'tensorflow-gpu==2.5.3'
...
real    0m15.923s
user    0m11.784s
sys     0m1.643s

Executing pex with sufficient verbosity confirms that >15 seconds of that pex process is spent within pip. In the uncached case, we still do better, at 26s for resolve_new() in the prototype branch vs 43s for pex lock create on main.

While looking to incorporate these changes, I found that pex3 lock create currently scans the output of pip download to extract hashes and download locations, which are contained in the current --report json output. I didn't want to spend the time replacing that yet, but I suspect leaning on the metadata-only resolve json will make the implementation of pex3 lock easier to follow.

Remaining tasks (for the prototype branch at https://github.com/cosmicexplorer/pex/tree/pip-json-resolve):

cosmicexplorer commented 1 year ago

Over slack (https://pantsbuild.slack.com/archives/C087V4P1T/p1691343949034449), @jsirois urged me to look at the prior investigation by @thejcannon, with discussion at https://pantsbuild.slack.com/archives/C087V4P1T/p1688051841183419 and https://github.com/pantsbuild/pex/issues/2044#issuecomment-1622245760. In particular, @jsirois raised the possibility of making use of resolvelib directly as opposed to invoking pip at all, which would require reimplementing PEP 658 and lazy wheel/fast-deps support in pex to take full advantage of, but also makes it easier for pex (and therefore pants) to employ the pip resolution algorithm incrementally to support pex's use cases. In particular he identified the application to universal lockfiles as the key reason to avoid using pip install --report, as he suspected it would present the most difficulty for the resolve report.

cosmicexplorer commented 1 year ago

In particular, I was advised by pip maintainers (see https://github.com/pypa/pip/issues/12184#issuecomment-1653655313) to approach the metadata lookup caching sketched out in pypa/pip#12184 as a plugin to resolvelib, or some other such mechanism that would also be employable by other users of resolvelib.

cosmicexplorer commented 1 year ago

@thejcannon's prior branch testing this is at https://github.com/thejcannon/pex/tree/jcannon/pip-report.

thejcannon commented 1 year ago

In my testing, the only large red flag was that VCS reqs in PEX are hashed via their downloaded zip. pip's report doesn't do that (but does embed the relevant commit in the metadata).