Closed Eh2406 closed 3 years ago
I saw a reddit post about cargo's resolver and its performance and it reminded me of this. I decided to give this a go and I just now realized that you are the person posted that :)
Anyway, I decided to give it a try and it seems to work fine. You can take a look at it here.
I also did a very basic benchmark using rustc as an example and this is what I got:
Benchmark | Time (ms) |
---|---|
cargo master branch (4389f620) | 3325 |
with rpds | 3423 |
with rpds but using Rc instead of Arc |
3172 |
This was measured with cd rust/src && CARGO_PROFILE=1 cargo generate-lockfile -v | grep resolving
.
As you can see, in this basic benchmark, the performance of rpds is worse than before, but when using Rc
instead of Arc
the performance is better. It can very well be the case that for a more complex dependency graph the normal rpds (with Arc
) is faster, but it doesn't happen in this case.
It is, however, planned for rpds to support both Rc
and Arc
in the future (see #7). There is also some potential gain with another performance improvement, described in #9.
I would like to perform a more complex benchmark, which I saw you did (by looking at your reddit post). Can you tell me what you methodology was?
Thanks for looking it this! That diff looks manageable to me. There are 2 benchmarks. The first is a "large but straight forward" test usually servo or rustc. This one can get faster or slower it is all ok, but it does need to be checked as it is really easy to make it 10x slower witch is unacceptable. The other test is how fast can it get to 10M ticks for a graf that it cannot resolve. Unfortunately, with https://github.com/rust-lang/cargo/pull/5213 we are fresh out of examples of things that do not terminate. One alternative is to comment out the if let Some(known_related_bad_deps) =
part so that https://github.com/rust-lang/cargo/issues/4810#issuecomment-357553286 is back to not terminateting.
Also I wonder if the Rc<Vec<Summary>>
can be replaced with a persistent {vec, queue, set, something}.
Also I usually got more targeted timing numbers with code like:
if ticks % 1000000 == 0 {
let e = start.elapsed();
println!("{} ticks, {}s, {:.3} ticks/ms, {}, {}, {}",
ticks,
e.as_secs(),
ticks as f32 / (e.as_secs() as f32 * 1000.0),
cx.activations.iter().map(|x| x.1.len()).sum::<usize>(),
past_conflicting_activations.iter().map(|x| x.1.len()).sum::<usize>(),
remaining_deps.len(),
);
}
added after !printed && ticks % 1000 == 0
Thank you for your tips!
I did some further benchmarking and I'm going to document it here. First thing I need to say it that my previous benchmark is meaningless because it included network i/o and it was a single run.
I added a new profiler to measure only the main resolver loop. I used this script to run the benchmark. I used 20 iterations.
Benchmarking servo | Median Time (ms) | Average Time (ms) |
---|---|---|
cargo master branch (4389f620) | 1870 | 2640 |
with rpds | 1695 | 2805 |
with rpds but using Rc instead of Arc |
1339 | 3221 |
These results are discouraging but I followed @Eh2406 suggestion and comment out the if let Some(known_related_bad_deps) =
part and added his code to output some stats. I then tested this with the Cargo.toml
shown in https://github.com/rust-lang/cargo/issues/4810#issuecomment-357553286. This was what I got:
Benchmark | Time @ 20 Mticks (s) |
---|---|
cargo master branch (4389f620) | 61 |
with rpds | 60 |
with rpds but using Rc instead of Arc |
54 |
So, there was only a small improvement here with Rc
(pending on #7).
I'm not ready to give up on this since there is at least on performance related improvements that might this worthwhile (#14). I will re-evaluate this when #14 lands.
That is actually all very encouraging. As some background before https://github.com/rust-lang/cargo/pull/5147 Activations::clone
was most of any profile of the resolver, after it is just a part. So I am not surprised that now this is not a "night vs day" change. Also since then each tick
dose more work so times are not comparable between now and then.
For the servo benchmark that +28% in Median Time is grate, the -22% on Average Time is concerning but the big question for the servo case is 10x regressions, so 3.2sec vs 2.6sec is probably ok.
For the other case that is a +11% improvement. That is nothing to sneeze at, given all the work that has gone into optimizing the resolver.
Sounds like it is worth talking to the cargo team (about adding the dependency) when #7 adds a way to use Rc
.
Definitely worth continued exploration of questions. How much difrants dose #14 make? Are there more Rc<things>
that should ues rpds structures? Are there things that get cloned into a BacktrackFrame
that should ues rpds structures?
How much difrants dose #14 make?
In general it is hard to say, because it will depend on the amount of sharing in the data structures. There is one place where it is definitly going to be significantly faster (not sure if this piece of code runs a lot of times).
Are there more Rc
that should ues rpds structures? Are there things that get cloned into a BacktrackFrame that should ues rpds structures?
From a quick look it seems so! There is RemainingCandidates::conflicting_prev_active
and BacktrackFrame::conflicting_activations
. Tomorrow I will check if there is any speed improvements in making these maps persistent.
I am excited by this progress. Unfortunately I will not have time to look at my computer this weekend.
Since the keys and values stored in the hashmaps are Arc
s (PackageId
/HashTrieSet
) or small/cheaply cloneable (InternedString
) I wouldn't be surprised if https://github.com/orium/rpds/issues/7#issuecomment-377475048 would help (skip boxing keys and values in another Arc
inside rpds). Depending on how much time is spent doing hashlookups the extra pointer chasing may end up fairly significant and avoiding the allocations would also be nice.
Ok, hear is how I see it from reading this thread. @orium has demonstrated that the diff to mvp of resolver-with-persistent-maps
is not big. That mvp with a solution to #7 (using RC) gets a positive change.
Once we have the mvp, there is low hanging fruit. https://github.com/orium/rpds/issues/7#issuecomment-377475048, and #14 on the rpds side and moving more structures on the cargo side.
Once we have the mvp, there is low hanging fruit. #7 (comment), and #14 on the rpds side and moving more structures on the cargo side.
I'm working on #14, in particular the HashTrieMap
. I think we will be able to see how this affects cargo's performance soon.
Closing since cargo is now using im
.
Hi,
I was reviewing the code in cargo's resolver when I came across this line: "We should switch to persistent hash maps if we can..." and thought didn't I just see a persistent hash map crate.
So if/when this crate is ready, perhaps a pr thare?
Just my 2c.