Closed alexcrichton closed 9 months ago
For incremental compilation I was thinking about implementing a simple hash-table that can be memory-mapped. DefPathTable would be another candidate for that.
@michaelwoerister Is that something you could give @alexcrichton a bit more info on and he could tackle?
Each crate we are loading contains a DefPathTable
and we are loading it eagerly at the moment. This can be quite a big piece of data and decoding it involves a reverse-lookup hash map from some of the loaded data.
I see now that having a mmapped version of that reverse lookup table would be kind of tricky, since the keys (DefKey
) contain interned ast::Symbols
. It would be possible to come up with something, it would be quite some work though, I guess.
As a starting pointing, one could try to decode the DefPathTables
more lazily. E.g. decode the whole thing on first access or lazily populate the reverse lookup table.
Hm, looking at the code again, it seems like we actually don't need the reverse lookup map anymore :D
When I switched the dep-graph to using stable DefPathHashes
instead of actual DefPaths
, I removed the last user of Definitions::retrace_path()
, which in turn is the only user of the table.
I'll make a PR removing the table...
I opened https://github.com/rust-lang/rust/pull/43361 and would be interested in the impact this has here.
In testing locally @michaelwoerister it looks like #43361 shaves about 20ms off an empty file compile time, thanks @michaelwoerister! After that PR the longest timings are:
time: 0.050; rss: 91MB translation
time: 0.017; rss: 59MB expansion
time: 0.002; rss: 100MB LLVM passes
time: 0.001; rss: 100MB linking
time: 0.001; rss: 38MB parsing
Compiling an empty crate takes a little under 20ms today. This is a little better than clang (v14, what I have on my system) clang -c t.c
on an empty C file, and around 2x worse than gcc (version 12) on my system.
$ hyperfine --warmup 3 -N "$HOME/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/bin/rustc --crate-type=lib empty.rs" "clang -c t.c" "gcc -c t.c"
Benchmark 1: /home/mark/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/bin/rustc --crate-type=lib empty.rs
Time (mean ± σ): 16.6 ms ± 1.9 ms [User: 7.4 ms, System: 9.7 ms]
Range (min … max): 11.6 ms … 20.6 ms 145 runs
Benchmark 2: clang -c t.c
Time (mean ± σ): 19.6 ms ± 1.8 ms [User: 8.6 ms, System: 10.9 ms]
Range (min … max): 15.8 ms … 22.6 ms 138 runs
Benchmark 3: gcc -c t.c
Time (mean ± σ): 8.7 ms ± 1.3 ms [User: 5.9 ms, System: 2.7 ms]
Range (min … max): 5.3 ms … 11.4 ms 336 runs
Summary
gcc -c t.c ran
1.91 ± 0.37 times faster than /home/mark/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/bin/rustc --crate-type=lib empty.rs
2.26 ± 0.40 times faster than clang -c t.c
The majority of the difference seems to be explained by the number of pages faulted in (possibly indirectly as a proxy for memory used, etc.) - perf stat -e faults
reports gcc takes ~1489 faults and rustc takes ~2963 faults on this benchmark. I don't think there's much we can do to modify that without significant work. Given that we're on par with clang and within the same ballpark as gcc, I'm going to go ahead and close.
Compiling an empty file into an rlib takes about 200 milliseconds locally which is a pretty significant chunk of time! Various passes look like:
Notably:
I believe the expansion timings are all related to: