Avoid allocating before we have checked if it is in the cache + don't rehash subtree

simonvandel commented 4 years ago

We now check if the node/token is in the cache. If it is not in the cache, then we allocate. In addition, avoid hashing the complete sub-tree.

This speeds up analysis-stats on rust-analyser by ~5% on my local machine.

Benchmark ➜ rust-analyzer-base git:(master) hyperfine --warmup 1 --min-runs=3 '/home/simon/Documents/rust-analyzer/target/release/rust-analyzer analysis-stats .' '/home/simon/Documents/rust-analyzer-base/target/release/rust-analyzer analysis-stats .' Benchmark #1: /home/simon/Documents/rust-analyzer/target/release/rust-analyzer analysis-stats . Time (mean ± σ): 46.027 s ± 0.419 s [User: 45.120 s, System: 0.785 s] Range (min … max): 45.551 s … 46.343 s 3 runs

Benchmark #2: /home/simon/Documents/rust-analyzer-base/target/release/rust-analyzer analysis-stats . Time (mean ± σ): 48.445 s ± 0.457 s [User: 47.657 s, System: 0.715 s] Range (min … max): 47.978 s … 48.892 s 3 runs

Summary '/home/simon/Documents/rust-analyzer/target/release/rust-analyzer analysis-stats .' ran 1.05 ± 0.01 times faster than '/home/simon/Documents/rust-analyzer-base/target/release/rust-analyzer analysis-stats .'

CAD97 commented 4 years ago

@simonvandel tangentially related: would you mind testing #58 to see what impact it has? I'm working on an experimental(ish) rewrite of rowan's green trees and don't have the experience to actually measure the impact properly here on Windows.

simonvandel commented 4 years ago

@CAD97 for benchmarking I simply ran hyperfine with a release build with and without the changes applied. It’s probably easier if you just run that locally yourself.

simonvandel commented 4 years ago

The newest revision avoids the api change, simplifies the hashing and does not hash the subtree. ~8% improvement locally. @CAD97, @matklad this is ready for review

simonvandel commented 4 years ago

Hi @CAD97 The latest commit should resolve the hash collision handling. Would you like to take a look again? With the latest changes, the performance dropped a bit, but is still 5% faster than without this PR. I'll update the description with the new results.

I propose landing this in the current state. As a follow-up PR, I would like to experiment with eliminating the need for the SmallVec collection. Essentially I would like to compute the hash of kind + children pointers by just iterating through the input children iterator. That probably requires using the raw_entry api to avoid the hashmap key needing to be owned. Is it possible to iterate the children iterator twice? One time for building the hash, and then potentially a second time in case the node did not exist in the cache, and therefore needs to be allocated.

CAD97 commented 4 years ago

As I mentioned on zulip, I'm concerned now that this as currently implemented will noticeably lower the number of cache hits, as two nodes are only considered deduplicatable by the current test if all children were deduplicated (and thus are the same identity).

Obviously, skipping the heap allocation in the common case is beneficial, as this patch is still positive. But before we do anything other than that one small step, I really would like to see some measurement on heuristics for what nodes are cachable as discussed on zulip.

CAD97 commented 4 years ago

Is it possible to iterate the children iterator twice?

Not while we take [Into]Iterator.

(Apologies: I'm going to use sorbus's names for things below because I'm looking at sorbus right now and don't remember exactly how sorbus's builder names map back into rowan's.)

The tricky part is that a key part of how TreeBuilder functions is by calling Builder::node with an argument of type Drain. The only other argument type I see being used for this API is Vec (and maybe eventually [_; N] once it has a by-value into_iter).

The thing is, all of these types offer a fn as_slice(&self) -> &[T], which would be very useful for the "iter once by ref then again by val" that we would really like to do here to avoid a temporary collection... but no generic way to talk about that functionality.

Off I go to try to make a crate to define an iterator trait for "iterables that can be iterated non-destructively by-ref and then destructively by-val". It won't be perfect because I think the optimal definition wants GAT, but I think I might be able to make something work.

EDIT: wasn't able to figure anything out, and vec::Drain::as_slice is unstable anyway.

CAD97 commented 4 years ago

Over at sorbus I implemented a dirty (nightly-only) version of zero-allocation cache hits. I also shared our use case on the tracking issue and asked about potentially stabilizing it.

simonvandel commented 4 years ago

Hmm, I don't think we can land this as an improvement before we can reuse the allocation of the children iterator passed into NodeCache::node. I tried removing the check for <= 3 children, but that regresses performance. Presumably because of the need to allocate the children all the time into a vector.

rust-analyzer / rowan

Avoid allocating before we have checked if it is in the cache + don't rehash subtree #59