Open cosmicexplorer opened 1 year ago
Over slack (https://pantsbuild.slack.com/archives/C087V4P1T/p1691343949034449), @jsirois urged me to look at the prior investigation by @thejcannon, with discussion at https://pantsbuild.slack.com/archives/C087V4P1T/p1688051841183419 and https://github.com/pantsbuild/pex/issues/2044#issuecomment-1622245760. In particular, @jsirois raised the possibility of making use of resolvelib
directly as opposed to invoking pip at all, which would require reimplementing PEP 658 and lazy wheel/fast-deps
support in pex to take full advantage of, but also makes it easier for pex (and therefore pants) to employ the pip resolution algorithm incrementally to support pex's use cases. In particular he identified the application to universal lockfiles as the key reason to avoid using pip install --report
, as he suspected it would present the most difficulty for the resolve report.
In particular, I was advised by pip maintainers (see https://github.com/pypa/pip/issues/12184#issuecomment-1653655313) to approach the metadata lookup caching sketched out in pypa/pip#12184 as a plugin to resolvelib
, or some other such mechanism that would also be employable by other users of resolvelib
.
@thejcannon's prior branch testing this is at https://github.com/thejcannon/pex/tree/jcannon/pip-report.
In my testing, the only large red flag was that VCS reqs in PEX are hashed via their downloaded zip. pip
's report doesn't do that (but does embed the relevant commit in the metadata).
After quite a long saga (pypa/pip#53), pip has the
--report=<out.json>
option topip install
(see pypa/pip#10771). This can be combined with--ignore-installed
and--dry-run
to produce a resolve report specifically for the uses of tools like pex. There are some further changes in flight to make this metadata-only resolve significantly faster by avoiding any downloads at all (see pypa/pip#12186), and plans to get it down to almost instantaneous by caching metadata lookups (pypa/pip#12184). With the--use-feature=fast-deps
option, these improvements also apply to resolves against wheels in a--find-links
index or a pypi-like index that hasn't yet implemented PEP 658 (pypi itself has only just now enabled it).One use case where this shines is lock file creation. A prototype I made incorporating a few of the mentioned in-progress changes exposes a function
pex.resolver.resolve_new()
to executepip install --report
, but with otherwise the same arguments asresolve()
: https://github.com/pantsbuild/pex/compare/main...cosmicexplorer:pip-json-resolve?expand=1. Without any of the work from pypa/pip#12184, this halves the time pex spends within pip when creating a lockfile:Executing pex with sufficient verbosity confirms that >15 seconds of that pex process is spent within pip. In the uncached case, we still do better, at 26s for
resolve_new()
in the prototype branch vs 43s forpex lock create
on main.While looking to incorporate these changes, I found that
pex3 lock create
currently scans the output ofpip download
to extract hashes and download locations, which are contained in the current--report
json output. I didn't want to spend the time replacing that yet, but I suspect leaning on the metadata-only resolve json will make the implementation ofpex3 lock
easier to follow.Remaining tasks (for the prototype branch at https://github.com/cosmicexplorer/pex/tree/pip-json-resolve):
PipVersionValue
to select pip versions that support--report
.resolve_new()
to something likemetadata_only_resolve()
.lock {create,update}
consumemetadata_only_resolve()
.PipVersionValue
to keep up to speed with performance improvements, otherwise defaulting to the current implementation which scans output logs when the latest pip version does not support--report
.