cargo could use a persistent hash map in it's resolver

Eh2406 commented 6 years ago

Hi,

I was reviewing the code in cargo's resolver when I came across this line: "We should switch to persistent hash maps if we can..." and thought didn't I just see a persistent hash map crate.

So if/when this crate is ready, perhaps a pr thare?

Just my 2c.

orium commented 6 years ago

I saw a reddit post about cargo's resolver and its performance and it reminded me of this. I decided to give this a go and I just now realized that you are the person posted that :)

Anyway, I decided to give it a try and it seems to work fine. You can take a look at it here.

I also did a very basic benchmark using rustc as an example and this is what I got:

Benchmark	Time (ms)
cargo master branch (4389f620)	3325
with rpds	3423
with rpds but using `Rc` instead of `Arc`	3172

This was measured with cd rust/src && CARGO_PROFILE=1 cargo generate-lockfile -v | grep resolving.

As you can see, in this basic benchmark, the performance of rpds is worse than before, but when using Rc instead of Arc the performance is better. It can very well be the case that for a more complex dependency graph the normal rpds (with Arc) is faster, but it doesn't happen in this case.

It is, however, planned for rpds to support both Rc and Arc in the future (see #7). There is also some potential gain with another performance improvement, described in #9.

I would like to perform a more complex benchmark, which I saw you did (by looking at your reddit post). Can you tell me what you methodology was?

Eh2406 commented 6 years ago

Thanks for looking it this! That diff looks manageable to me. There are 2 benchmarks. The first is a "large but straight forward" test usually servo or rustc. This one can get faster or slower it is all ok, but it does need to be checked as it is really easy to make it 10x slower witch is unacceptable. The other test is how fast can it get to 10M ticks for a graf that it cannot resolve. Unfortunately, with https://github.com/rust-lang/cargo/pull/5213 we are fresh out of examples of things that do not terminate. One alternative is to comment out the if let Some(known_related_bad_deps) = part so that https://github.com/rust-lang/cargo/issues/4810#issuecomment-357553286 is back to not terminateting.

Also I wonder if the Rc<Vec<Summary>> can be replaced with a persistent {vec, queue, set, something}.

Eh2406 commented 6 years ago

Also I usually got more targeted timing numbers with code like:

            if ticks % 1000000 == 0 {
                let e = start.elapsed();
                println!("{} ticks, {}s, {:.3} ticks/ms, {}, {}, {}",
                         ticks,
                         e.as_secs(),
                         ticks as f32 / (e.as_secs() as f32 * 1000.0),
                         cx.activations.iter().map(|x| x.1.len()).sum::<usize>(),
                         past_conflicting_activations.iter().map(|x| x.1.len()).sum::<usize>(),
                         remaining_deps.len(),
                );
            }

added after !printed && ticks % 1000 == 0

orium commented 6 years ago

Thank you for your tips!

I did some further benchmarking and I'm going to document it here. First thing I need to say it that my previous benchmark is meaningless because it included network i/o and it was a single run.

I added a new profiler to measure only the main resolver loop. I used this script to run the benchmark. I used 20 iterations.

Benchmarking servo	Median Time (ms)	Average Time (ms)
cargo master branch (4389f620)	1870	2640
with rpds	1695	2805
with rpds but using `Rc` instead of `Arc`	1339	3221

These results are discouraging but I followed @Eh2406 suggestion and comment out the if let Some(known_related_bad_deps) = part and added his code to output some stats. I then tested this with the Cargo.toml shown in https://github.com/rust-lang/cargo/issues/4810#issuecomment-357553286. This was what I got:

Benchmark	Time @ 20 Mticks (s)
cargo master branch (4389f620)	61
with rpds	60
with rpds but using `Rc` instead of `Arc`	54

So, there was only a small improvement here with Rc (pending on #7).

I'm not ready to give up on this since there is at least on performance related improvements that might this worthwhile (#14). I will re-evaluate this when #14 lands.

Eh2406 commented 6 years ago

That is actually all very encouraging. As some background before https://github.com/rust-lang/cargo/pull/5147 Activations::clone was most of any profile of the resolver, after it is just a part. So I am not surprised that now this is not a "night vs day" change. Also since then each tick dose more work so times are not comparable between now and then.

For the servo benchmark that +28% in Median Time is grate, the -22% on Average Time is concerning but the big question for the servo case is 10x regressions, so 3.2sec vs 2.6sec is probably ok.

For the other case that is a +11% improvement. That is nothing to sneeze at, given all the work that has gone into optimizing the resolver.

Sounds like it is worth talking to the cargo team (about adding the dependency) when #7 adds a way to use Rc.

Definitely worth continued exploration of questions. How much difrants dose #14 make? Are there more Rc<things> that should ues rpds structures? Are there things that get cloned into a BacktrackFrame that should ues rpds structures?

orium commented 6 years ago

How much difrants dose #14 make?

In general it is hard to say, because it will depend on the amount of sharing in the data structures. There is one place where it is definitly going to be significantly faster (not sure if this piece of code runs a lot of times).

Are there more Rc that should ues rpds structures? Are there things that get cloned into a BacktrackFrame that should ues rpds structures?

From a quick look it seems so! There is RemainingCandidates::conflicting_prev_active and BacktrackFrame::conflicting_activations. Tomorrow I will check if there is any speed improvements in making these maps persistent.

Eh2406 commented 6 years ago

I am excited by this progress. Unfortunately I will not have time to look at my computer this weekend.

Marwes commented 6 years ago

https://github.com/orium/cargo/commit/4007c3892b02b7c8f1ae9a33b39522f3da6ae0fa#diff-2305770923daa439ab4f47bb3f1abf5cR372

Since the keys and values stored in the hashmaps are Arcs (PackageId/HashTrieSet) or small/cheaply cloneable (InternedString) I wouldn't be surprised if https://github.com/orium/rpds/issues/7#issuecomment-377475048 would help (skip boxing keys and values in another Arc inside rpds). Depending on how much time is spent doing hashlookups the extra pointer chasing may end up fairly significant and avoiding the allocations would also be nice.

Eh2406 commented 6 years ago

Ok, hear is how I see it from reading this thread. @orium has demonstrated that the diff to mvp of resolver-with-persistent-maps is not big. That mvp with a solution to #7 (using RC) gets a positive change.

Once we have the mvp, there is low hanging fruit. https://github.com/orium/rpds/issues/7#issuecomment-377475048, and #14 on the rpds side and moving more structures on the cargo side.

orium commented 6 years ago

Once we have the mvp, there is low hanging fruit. #7 (comment), and #14 on the rpds side and moving more structures on the cargo side.

I'm working on #14, in particular the HashTrieMap. I think we will be able to see how this affects cargo's performance soon.

orium commented 3 years ago

Closing since cargo is now using im.

orium / rpds

cargo could use a persistent hash map in it's resolver #6