Closed Eh2406 closed 6 months ago
Let me know if my understanding is correct.
This PR is a memory/compute trade off.
Instead of only keeping the latest assignments intersection, we keep a record of intersections at each stage by storing it in dated_derivations.
This change prevents the need to recompute intersections when backtracking since dated derivations also store the intersection at the corresponding time.
I suppose AssignmentsIntersection::Derivations
could now use a reference "&" to the latest dated derivation intersection instead of its own copy?
Great work, that makes the two bad cases i test a good bit faster:
main: f64c499f83e5798a583f9850fac28d2e7bf1dfbf branch: 8d5cfbb28ab82bca51328e0224147c0860925115
tf-models-nightly main
real 0m5,509s
user 0m5,228s
sys 0m0,274s
tf-models-nightly branch
real 0m2,905s
user 0m2,630s
sys 0m0,272s
bio_embeddings main
real 2m7,274s
user 2m5,410s
sys 0m1,396s
bio_embeddings branch
real 1m42,388s
user 1m40,671s
sys 0m1,533s
@mpizenberg, yes your understanding is correct.
I suppose
AssignmentsIntersection::Derivations
could now use a reference "&" to the latest dated derivation intersection instead of its own copy?
Theoretically yes. But then PackageAssignments
become self-referential. I experimented not storing anything in AssignmentsIntersection::Derivations
and retrieving the intersection with pa.dated_derivations.last().unwrap()
but the code was too convoluted for me to find the bug I had introduced. We could share the memory with an RC
but given its only can be used in two places it's probably the same cost as a clone
. So I decided any larger change should be done in a follow-up.
I fixed the comment and improved some redundancy. This should be ready for a final review.
I also generated a Ron file inspired by the example I've been using for testing. slow_135_0_u16_NumberVersion.txt (change from txt to ron to use). It's runtime gets about 50% faster after this PR. How realistic is it 🤷 .
This eliminates the intersection calls made when backtracking. Unfortunately, it is within the noise on all of my standard benchmarks. However it does cause dramatic speedups for https://github.com/pubgrub-rs/pubgrub/issues/135#issuecomment-1780127284 (I really need to make a ron file of this, so that I can add it to my normal benchmarking suite.)
Based on the flame graphs we reviewed in Office Hours today I suspect this will make a solid difference for @konstin workloads. It also unlocks more impactful simplifications in satisfier search.