Closed cosmicexplorer closed 2 months ago
This sounds like a really nice idea. I haven't thought it through in any detail yet, but maybe this is something that could be implemented as a plugin/wrapper for resolvelib? As far as I know, resolvelib makes the same assumptions (candidates are immutable, and a resolve will not change unless the set of candidates involved changes) so in theory it should be possible to write a library that implements some sort of CachingResolver
on top of resolvelib's BaseResolver
.
This may be over-engineering things, but I'm imagining maintainability benefits from some sort of separation of concerns like this.
I absolutely think that this is something the resolution library can and should be owning, as the cacheability is intrinsically linked to the resolution logic. I also have a hunch that this might be a case where the relative simplicity of resolvelib makes it much easier to memoize, virtualize, and amortize subproblems of the resolve, the same way writing a parser by hand lets you slap on wonky context-sensitive stuff more easily than a parser generator.
I think one possible result might be: resolvelib defines an interface for a "local index" oracle that would allow it to speed up some operations, and users of the library can provide an implementation. For example, Signal's client library for message en/decryption defines "store" interfaces for the types of secure information it needs to access, and each FFI backend implements those stores so that their contents remain secure https://github.com/signalapp/libsignal/blob/3b7f3173cc4431bc4c6e55f6182a37229b2db6fd/rust/protocol/src/storage/traits.rs#L45.
This may be over-engineering things, but I'm imagining maintainability benefits from some sort of separation of concerns like this.
I mentioned in the issue that spack has the same problem, and I'm absolutely hoping we can use this opportunity to make some generalizable tool instead of hoarding all the performance wins like so many projects end up doing.
In the prior work, I was able to distinguish two types of indices or caching:
fast-deps
make this faster, but we're still doing at least two network calls for each install candidate and that i/o needs to be completely pipelined out of the main resolve loop if at all possible.Thanks again for mentioning resolvelib, @pfmoore: I think its approach makes identifying these cache points much easier. From https://github.com/sarugaku/resolvelib/blob/main/src/resolvelib/providers.py, I can immediately see that get_dependencies()
could be well-served by a key-value store or "lookup" index, while identify()
and is_satisfied_by()
by definition should be quick enough to compute pairwise and can be ignored. However, get_preference()
and find_matches()
will both:
I think it would make sense to set aside the fun part (how to amortize the state-dependent resolve logic of get_preferences()
and find_matches()
) for now and first make a proof of concept for a key-value store to cache get_dependencies()
, which can be implemented without reference to any specific resolver. The next part will get interesting.
Only thing I’m wondering at the moment is whether the cache should be per-artifact instead of per-package-version since technically each wheel of the same version can have different dependencies, and there are known packages that use this “feature”. These are always an edge case but making the cache per-artifact shouldn’t add much overhead in most cases so I don’t see why that’d be a bad idea anyway. A cache for get_dependencies
should be pretty easy to implement (at $work I actually made a similar thing, except on top of pip and less fine-grained since our scope is narrower).
"per-artifact" is exactly the right way to describe it! I'll use that terminology.
since technically each wheel of the same version can have different dependencies, and there are known packages that use this “feature”
Thanks so much for pointing this out! We will also need to be especially careful about documenting the assumptions we make when we begin looking at how to amortize sections of the stateful resolve process, not just caching stateless network requests. I do really appreciate how the definition of an idempotent identity for each candidate is covered by the AbstractProvider#identify()
operation in resolvelib, though.
Not sure Im following correctly—identify
doesn’t really provide a unique identity to each candidate; it’s more like for grouping things together (which candidates correspond to which requirement) so the resolver knows what it got from the provider. In Python this is just the package name (optionally the extras attached) and does not really effectively identify a candidate in a general sense. I don’t think we can change this either since the identities of a requirement and the candidates found with it are supposed to be identical for the resolver to work.
Thanks so much for explaining that! I assumed something totally different from the name.
I do not believe that arrangement in resolvelib will hinder this investigation.
....so I've been getting some pretty ridiculous results. My current branch is at https://github.com/cosmicexplorer/pip/tree/integration-branch, and it incorporates #12186, #12197, #12193, #12208, and then a couple other changes to perform the types of caching discussed above. In particular, it introduces a cache for looking up metadata from a Link
(this part is very small), then another (more complex) cache for parsing links from index or find-links pages:
> time python3.8 -m pip install --dry-run --ignore-installed --report test.json --use-feature=fast-deps 'numpy>=1.19.5' 'keras==2.4.3' 'mtcnn' 'pillow>=7.0.0' 'bleach>=2.1.0' 'tensorflow-gpu==2.5.3'
WARNING: pip is using lazily downloaded wheels using HTTP range requests to obtain dependency information. This experimental feature is enabled through --use-feature=fast-deps and it is not ready for production.
Collecting numpy>=1.19.5
Collecting keras==2.4.3
Collecting mtcnn
Collecting pillow>=7.0.0
Collecting bleach>=2.1.0
Collecting tensorflow-gpu==2.5.3
Collecting scipy>=0.14 (from keras==2.4.3)
Collecting pyyaml (from keras==2.4.3)
Collecting h5py (from keras==2.4.3)
Collecting numpy>=1.19.5
Collecting absl-py~=0.10 (from tensorflow-gpu==2.5.3)
Collecting astunparse~=1.6.3 (from tensorflow-gpu==2.5.3)
Collecting flatbuffers~=1.12.0 (from tensorflow-gpu==2.5.3)
Collecting google-pasta~=0.2 (from tensorflow-gpu==2.5.3)
Collecting h5py (from keras==2.4.3)
Collecting keras-preprocessing~=1.1.2 (from tensorflow-gpu==2.5.3)
Collecting opt-einsum~=3.3.0 (from tensorflow-gpu==2.5.3)
Collecting protobuf>=3.9.2 (from tensorflow-gpu==2.5.3)
Collecting six~=1.15.0 (from tensorflow-gpu==2.5.3)
Collecting termcolor~=1.1.0 (from tensorflow-gpu==2.5.3)
Collecting typing-extensions~=3.7.4 (from tensorflow-gpu==2.5.3)
Collecting wheel~=0.35 (from tensorflow-gpu==2.5.3)
Collecting wrapt~=1.12.1 (from tensorflow-gpu==2.5.3)
Collecting gast==0.4.0 (from tensorflow-gpu==2.5.3)
Collecting tensorboard~=2.5 (from tensorflow-gpu==2.5.3)
Collecting tensorflow-estimator<2.6.0,>=2.5.0 (from tensorflow-gpu==2.5.3)
Collecting keras-nightly~=2.5.0.dev (from tensorflow-gpu==2.5.3)
Collecting grpcio~=1.34.0 (from tensorflow-gpu==2.5.3)
Collecting opencv-python>=4.1.0 (from mtcnn)
Collecting webencodings (from bleach>=2.1.0)
INFO: pip is looking at multiple versions of tensorboard to determine which version is compatible with other requirements. This could take a while.
Collecting tensorboard~=2.5 (from tensorflow-gpu==2.5.3)
Collecting google-auth<3,>=1.6.3 (from tensorboard~=2.5->tensorflow-gpu==2.5.3)
Collecting google-auth-oauthlib<0.5,>=0.4.1 (from tensorboard~=2.5->tensorflow-gpu==2.5.3)
Collecting markdown>=2.6.8 (from tensorboard~=2.5->tensorflow-gpu==2.5.3)
Collecting protobuf>=3.9.2 (from tensorflow-gpu==2.5.3)
Collecting requests<3,>=2.21.0 (from tensorboard~=2.5->tensorflow-gpu==2.5.3)
Collecting setuptools>=41.0.0 (from tensorboard~=2.5->tensorflow-gpu==2.5.3)
Collecting tensorboard-data-server<0.7.0,>=0.6.0 (from tensorboard~=2.5->tensorflow-gpu==2.5.3)
Collecting tensorboard-plugin-wit>=1.6.0 (from tensorboard~=2.5->tensorflow-gpu==2.5.3)
Collecting werkzeug>=1.0.1 (from tensorboard~=2.5->tensorflow-gpu==2.5.3)
Collecting cachetools<6.0,>=2.0.0 (from google-auth<3,>=1.6.3->tensorboard~=2.5->tensorflow-gpu==2.5.3)
Collecting pyasn1-modules>=0.2.1 (from google-auth<3,>=1.6.3->tensorboard~=2.5->tensorflow-gpu==2.5.3)
Collecting rsa<5,>=3.1.4 (from google-auth<3,>=1.6.3->tensorboard~=2.5->tensorflow-gpu==2.5.3)
Collecting urllib3<2.0 (from google-auth<3,>=1.6.3->tensorboard~=2.5->tensorflow-gpu==2.5.3)
Collecting requests-oauthlib>=0.7.0 (from google-auth-oauthlib<0.5,>=0.4.1->tensorboard~=2.5->tensorflow-gpu==2.5.3)
Collecting importlib-metadata>=4.4 (from markdown>=2.6.8->tensorboard~=2.5->tensorflow-gpu==2.5.3)
Collecting charset-normalizer<4,>=2 (from requests<3,>=2.21.0->tensorboard~=2.5->tensorflow-gpu==2.5.3)
Collecting idna<4,>=2.5 (from requests<3,>=2.21.0->tensorboard~=2.5->tensorflow-gpu==2.5.3)
Collecting certifi>=2017.4.17 (from requests<3,>=2.21.0->tensorboard~=2.5->tensorflow-gpu==2.5.3)
Collecting MarkupSafe>=2.1.1 (from werkzeug>=1.0.1->tensorboard~=2.5->tensorflow-gpu==2.5.3)
Collecting zipp>=0.5 (from importlib-metadata>=4.4->markdown>=2.6.8->tensorboard~=2.5->tensorflow-gpu==2.5.3)
Collecting pyasn1<0.6.0,>=0.4.6 (from pyasn1-modules>=0.2.1->google-auth<3,>=1.6.3->tensorboard~=2.5->tensorflow-gpu==2.5.3)
Collecting oauthlib>=3.0.0 (from requests-oauthlib>=0.7.0->google-auth-oauthlib<0.5,>=0.4.1->tensorboard~=2.5->tensorflow-gpu==2.5.3)
Would install Keras-2.4.3 Keras-Preprocessing-1.1.2 Markdown-3.4.4 MarkupSafe-2.1.3 Pillow-10.0.0 PyYAML-6.0.1 Werkzeug-2.3.6 absl-py-0.15.0 astunparse-1.6.3 bleach-6.0.0 cachetools-5.3.1 certifi-2023.7.22 charset-normalizer-3.2.0 flatbuffers-1.12 gast-0.4.0 google-auth-2.22.0 google-auth-oauthlib-0.4.6 google-pasta-0.2.0 grpcio-1.34.1 h5py-3.1.0 idna-3.4 importlib-metadata-6.8.0 keras-nightly-2.5.0.dev2021032900 mtcnn-0.1.1 numpy-1.19.5 oauthlib-3.2.2 opencv-python-4.8.0.76 opt-einsum-3.3.0 protobuf-3.20.3 pyasn1-0.5.0 pyasn1-modules-0.3.0 requests-2.31.0 requests-oauthlib-1.3.1 rsa-4.9 scipy-1.10.1 setuptools-68.0.0 six-1.15.0 tensorboard-2.11.2 tensorboard-data-server-0.6.1 tensorboard-plugin-wit-1.8.1 tensorflow-estimator-2.5.0 tensorflow-gpu-2.5.3 termcolor-1.1.0 typing-extensions-3.7.4.3 urllib3-1.26.16 webencodings-0.5.1 wheel-0.41.1 wrapt-1.12.1 zipp-3.16.2
real 0m4.335s
user 0m2.868s
sys 0m0.143s
The link metadata caching can be seen in https://github.com/pypa/pip/compare/main...cosmicexplorer:link-metadata-cache?expand=1, it's quite small. The work to cache link parsing is a bit larger and can be seen at https://github.com/cosmicexplorer/pip/compare/link-metadata-cache..link-parsing-cache?expand=1. I will be planning to make PRs of both of those once my other work is in.
For a sense of scale, the link metadata caching brings that command down from >50 seconds to 12-14 seconds, and caching link parsing from index pages takes that down to 4-6 seconds. I suspect we'd want to put these both behind the same feature flag, but probably add them in separate changes.
Ok, the work from the above has been split off into #12256 and #12257 (EDIT: and #12258!)!
This issue is now superseded by #12921, which has specific answers to all of the above discussion!!
What's the problem this feature will solve?
Splitting out this comment (https://github.com/pypa/pip/pull/11478#issuecomment-1653555553) into its own issue.
There remains one further avenue to investigate from #7819: amortizing/memoizing certain subproblems across separate invocations of the resolver. While I was able to identify one compartmentalized improvement in #7729, in general I still think pip (and all other package managers right now) currently spend most of their time recomputing a lot of the same constraints against each other over and over again for most hot-path use cases; for example, after identifying the subset of wheels that are compatible with a given interpreter, that information can be recorded and cached indefinitely by the local pip client to immediately jump to the subset of compatible wheels in future resolves.
Describe the solution you'd like
My vision is for pip to eventually perform most resolves entirely locally, only calling out to pypi when it's unable to find a satisfying set and it determines that a newer release of some package might be able to do so. I think this should be feasible without needing to store anything close to the magnitude of all the package contents on pypi, because cargo does that right now, and has some useful comments (https://github.com/rust-lang/cargo/blob/263d24baca913d7ace29e2f59a3254e47272fd6b/src/cargo/sources/registry/index.rs#L25-L30):
As detailed in that file, cargo essentially just serializes a list of the dependency relationships to json in order to publish their index, which is exactly how I implemented the persistent dependency resolution cache in #7819 that multiplied the effect of the
fast-deps
technique alone (https://github.com/cosmicexplorer/pip/blob/40cec94281aa9165c689dd4bb69f953e2fd5fb06/src/pip/_internal/resolution/legacy/resolver.py#L171-L304).According to this hypothesis, while
fast-deps
(and now PEP 658) gives you the dependency relationships needed to iterate the resolver over a single candidate without actually downloading the many-megabyte zip file, but pip still currently has to pay the cost of network calls to move the resolver forward to evaluate each candidate, and if the candidate is discarded it has to do that again until the resolver finds a satisfying dependency set. In this file, cargo describes generating an index over another index to amortize the graph traversal performed during a cargo resolve while also being able to incorporate any updates to the index that have occurred since the last resolve. There's nothing special cargo is doing here with rust that we can't do in python; as they note:Alternative Solutions
In pantsbuild/pex#2175 my goal for pex was to reduce i/o and computation from compressing very large amounts of third-party code we do not control. I figured out a hack to reduce that effort, but as @jsirois points out, it's often more robust and maintainable in the long term to develop standard protocols that are designed for the task at hand than rely on clever hacks to optimize existing ones. In that case, while I found it was possible to cache traditional
--layout zipapp
pex files with a clever hack, I learned that the--layout packed
format had been designed from the ground up to enable that cacheability. Analogously, @uranusjr spent the time to develop PEP 658 to avoid the need for us to use clever hacks likefast-deps
to get the information pip requires. In both cases, we were able to leverage features of the zip file format to put off standardization for a while, but we're in a better place now that more thoughtful infrastructure has been laid down.spack currently performs an analogous process, deserializing and processing all of the dependency constraints annotated in spack package recipes upon each spack invocation (cc @tgamblin @alalazo)). In contrast, spack's main package repository is maintained as part of the spack codebase itself, so it can always expect the luxury of loading everything from the local filesystem. Still, it demonstrates that amortizing dependency graph traversal while incorporating updates from the source of truth is an intrinsic issue to package resolution, as opposed to the problem solved by
fast-deps
which was a temporary workaround to slowly-evolving packaging standards.Additional context
Pip shares Cargo's assumption that each individual release of any package is immutable, so the dependency relationships that
fast-deps
/PEP 658 obtains from each wheel can therefore be cached and reused indefinitely after they are first computed. My hypothesis is that we can improve resolve performance by:package==version#checksum => [dependency requirements]
that pip extracts from pypi during a normal resolve into a local database of some form, to be reused whenever pip evaluates that installation candidate in future resolves.pip install --report
), we can turn this into more and more of a compute-bound rather than IO-bound problem.