rust-lang / rust

Empowering everyone to build reliable and efficient software.
https://www.rust-lang.org
Other
97.86k stars 12.67k forks source link

Significant amount of time to compile an empty file #43300

Closed alexcrichton closed 9 months ago

alexcrichton commented 7 years ago

Compiling an empty file into an rlib takes about 200 milliseconds locally which is a pretty significant chunk of time! Various passes look like:

$ touch foo.rs
$ rustc +nightly foo.rs --crate-type lib -Z  perf-stats -Z time-passes
time: 0.000; rss: 50MB  parsing
time: 0.000; rss: 50MB  recursion limit
time: 0.000; rss: 50MB  crate injection
time: 0.000; rss: 50MB  plugin loading
time: 0.000; rss: 50MB  plugin registration
time: 0.033; rss: 75MB  expansion
time: 0.000; rss: 75MB  maybe building test harness
time: 0.000; rss: 75MB  maybe creating a macro crate
time: 0.000; rss: 75MB  creating allocators
time: 0.000; rss: 75MB  checking for inline asm in case the target doesn't support it
time: 0.000; rss: 75MB  early lint checks
time: 0.000; rss: 75MB  AST validation
time: 0.000; rss: 78MB  name resolution
time: 0.000; rss: 78MB  complete gated feature checking
time: 0.000; rss: 78MB  lowering ast -> hir
time: 0.000; rss: 78MB  indexing hir
time: 0.000; rss: 78MB  attribute checking
time: 0.000; rss: 78MB  language item collection
time: 0.000; rss: 78MB  lifetime resolution
time: 0.000; rss: 78MB  looking for entry point
time: 0.000; rss: 78MB  looking for plugin registrar
time: 0.000; rss: 78MB  loop checking
time: 0.000; rss: 78MB  static item recursion checking
time: 0.000; rss: 78MB  compute_incremental_hashes_map
time: 0.000; rss: 78MB  load_dep_graph
time: 0.000; rss: 78MB  stability index
time: 0.000; rss: 78MB  stability checking
time: 0.000; rss: 78MB  type collecting
time: 0.000; rss: 78MB  impl wf inference
time: 0.000; rss: 78MB  coherence checking
time: 0.000; rss: 78MB  variance testing
time: 0.000; rss: 78MB  wf checking
time: 0.000; rss: 78MB  item-types checking
time: 0.000; rss: 78MB  item-bodies checking
time: 0.000; rss: 78MB  const checking
time: 0.000; rss: 78MB  privacy checking
time: 0.000; rss: 78MB  intrinsic checking
time: 0.000; rss: 78MB  effect checking
time: 0.000; rss: 78MB  match checking
time: 0.000; rss: 78MB  liveness checking
time: 0.000; rss: 78MB  borrow checking
time: 0.000; rss: 78MB  reachability checking
time: 0.000; rss: 78MB  death checking
time: 0.000; rss: 78MB  unused lib feature checking
time: 0.000; rss: 81MB  lint checking
time: 0.000; rss: 81MB  resolving dependency formats
  time: 0.000; rss: 81MB    write metadata
  time: 0.000; rss: 81MB    translation item collection
  time: 0.000; rss: 81MB    codegen unit partitioning
  time: 0.000; rss: 108MB   internalize symbols
time: 0.064; rss: 108MB translation
time: 0.000; rss: 108MB assert dep graph
time: 0.000; rss: 108MB serialize dep graph
  time: 0.000; rss: 108MB   llvm function passes [1]
  time: 0.000; rss: 108MB   llvm module passes [1]
  time: 0.001; rss: 109MB   codegen passes [1]
  time: 0.001; rss: 109MB   codegen passes [0]
time: 0.007; rss: 110MB LLVM passes
time: 0.000; rss: 110MB serialize work products
time: 0.001; rss: 110MB linking
Total time spent computing SVHs:               0.000
Total time spent computing incr. comp. hashes: 0.000
Total number of incr. comp. hashes computed:   4
Total number of bytes hashed for incr. comp.:  87
Average bytes hashed per incr. comp. HIR node: 21
Total time spent computing symbol hashes:      0.013
Total time spent decoding DefPath tables:      0.028

Notably:

time: 0.033; rss: 75MB  expansion
time: 0.064; rss: 108MB translation
time: 0.007; rss: 110MB LLVM passes

I believe the expansion timings are all related to:

Total time spent computing symbol hashes:      0.013
Total time spent decoding DefPath tables:      0.028
michaelwoerister commented 7 years ago

For incremental compilation I was thinking about implementing a simple hash-table that can be memory-mapped. DefPathTable would be another candidate for that.

aturon commented 7 years ago

@michaelwoerister Is that something you could give @alexcrichton a bit more info on and he could tackle?

michaelwoerister commented 7 years ago

Each crate we are loading contains a DefPathTable and we are loading it eagerly at the moment. This can be quite a big piece of data and decoding it involves a reverse-lookup hash map from some of the loaded data.

I see now that having a mmapped version of that reverse lookup table would be kind of tricky, since the keys (DefKey) contain interned ast::Symbols. It would be possible to come up with something, it would be quite some work though, I guess.

As a starting pointing, one could try to decode the DefPathTables more lazily. E.g. decode the whole thing on first access or lazily populate the reverse lookup table.

michaelwoerister commented 7 years ago

Hm, looking at the code again, it seems like we actually don't need the reverse lookup map anymore :D

When I switched the dep-graph to using stable DefPathHashes instead of actual DefPaths, I removed the last user of Definitions::retrace_path(), which in turn is the only user of the table.

I'll make a PR removing the table...

michaelwoerister commented 7 years ago

I opened https://github.com/rust-lang/rust/pull/43361 and would be interested in the impact this has here.

alexcrichton commented 7 years ago

In testing locally @michaelwoerister it looks like #43361 shaves about 20ms off an empty file compile time, thanks @michaelwoerister! After that PR the longest timings are:


time: 0.050; rss: 91MB  translation
time: 0.017; rss: 59MB  expansion
time: 0.002; rss: 100MB LLVM passes
time: 0.001; rss: 100MB linking
time: 0.001; rss: 38MB  parsing
Mark-Simulacrum commented 9 months ago

Compiling an empty crate takes a little under 20ms today. This is a little better than clang (v14, what I have on my system) clang -c t.c on an empty C file, and around 2x worse than gcc (version 12) on my system.

$ hyperfine --warmup 3 -N "$HOME/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/bin/rustc --crate-type=lib empty.rs" "clang -c t.c" "gcc -c t.c"
Benchmark 1: /home/mark/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/bin/rustc --crate-type=lib empty.rs
  Time (mean ± σ):      16.6 ms ±   1.9 ms    [User: 7.4 ms, System: 9.7 ms]
  Range (min … max):    11.6 ms …  20.6 ms    145 runs

Benchmark 2: clang -c t.c
  Time (mean ± σ):      19.6 ms ±   1.8 ms    [User: 8.6 ms, System: 10.9 ms]
  Range (min … max):    15.8 ms …  22.6 ms    138 runs

Benchmark 3: gcc -c t.c
  Time (mean ± σ):       8.7 ms ±   1.3 ms    [User: 5.9 ms, System: 2.7 ms]
  Range (min … max):     5.3 ms …  11.4 ms    336 runs

Summary
  gcc -c t.c ran
    1.91 ± 0.37 times faster than /home/mark/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/bin/rustc --crate-type=lib empty.rs
    2.26 ± 0.40 times faster than clang -c t.c

The majority of the difference seems to be explained by the number of pages faulted in (possibly indirectly as a proxy for memory used, etc.) - perf stat -e faults reports gcc takes ~1489 faults and rustc takes ~2963 faults on this benchmark. I don't think there's much we can do to modify that without significant work. Given that we're on par with clang and within the same ballpark as gcc, I'm going to go ahead and close.