rust-lang / rust

Empowering everyone to build reliable and efficient software.
https://www.rust-lang.org
Other
98.71k stars 12.75k forks source link

`rustc --version` is slow even without the rustup wrapper #121631

Open konstin opened 8 months ago

konstin commented 8 months ago

Problem Description

Running rustc --version without the rustup wrapper takes 11ms on my linux machine (See https://github.com/rust-lang/rustup/issues/2626 for the rustup side of this).

This is an issue for uv, as we've been asked to include the output of rustc --version in our user agent when making requests to the python package index so the python ecosystem gets usage stats. A minimal resolution with a network request (revalidation request) takes ~100ms on machine, so 20ms extra before the first network request is noticeable. I'd also be happy to read the default rustc version from another location, given that this works with alternative ways of installation.

Benchmarks

The benchmark runs from my user home on ubuntu, and i've include rustc with and without rustup, python without shim and node with volta shim and without for comparison. Tested with rustc 1.76.0 (07dca489a 2024-02-04).

$ hyperfine --warmup 10 --shell=none "rustc --version" ".rustup/toolchains/nightly-x86_64-unknown-linux-gnu/bin/rustc --version" "python --version" "node --version" ".volta/tools/image/node/18.18.2/bin/node --version"
Benchmark 1: rustc --version
  Time (mean ± σ):      19.9 ms ±   1.5 ms    [User: 14.8 ms, System: 5.0 ms]
  Range (min … max):    17.5 ms …  26.3 ms    157 runs

Benchmark 2: .rustup/toolchains/nightly-x86_64-unknown-linux-gnu/bin/rustc --version
  Time (mean ± σ):      10.6 ms ±   3.0 ms    [User: 4.8 ms, System: 5.6 ms]
  Range (min … max):     4.4 ms …  17.6 ms    240 runs

Benchmark 3: python --version
  Time (mean ± σ):       1.4 ms ±   0.5 ms    [User: 0.9 ms, System: 0.4 ms]
  Range (min … max):     0.3 ms …   2.3 ms    1635 runs

Benchmark 4: node --version
  Time (mean ± σ):       9.7 ms ±   3.1 ms    [User: 3.9 ms, System: 5.8 ms]
  Range (min … max):     2.8 ms …  14.3 ms    229 runs

Benchmark 5: .volta/tools/image/node/18.18.2/bin/node --version
  Time (mean ± σ):       7.2 ms ±   2.2 ms    [User: 2.4 ms, System: 4.6 ms]
  Range (min … max):     1.7 ms …  12.0 ms    796 runs

Summary
  python --version ran
    5.28 ± 2.54 times faster than .volta/tools/image/node/18.18.2/bin/node --version
    7.14 ± 3.50 times faster than node --version
    7.77 ± 3.64 times faster than .rustup/toolchains/nightly-x86_64-unknown-linux-gnu/bin/rustc --version
   14.62 ± 5.57 times faster than rustc --version

On a low-end server and a shared server the contrast to python becomes even more stark:

$ hyperfine --warmup 10 --shell=none ".rustup/toolchains/nightly-x86_64-unknown-linux-gnu/bin/rustc --version" "python3.11 --version"
Benchmark 1: .rustup/toolchains/nightly-x86_64-unknown-linux-gnu/bin/rustc --version
  Time (mean ± σ):      20.8 ms ±   2.3 ms    [User: 7.8 ms, System: 12.7 ms]
  Range (min … max):    18.3 ms …  35.9 ms    136 runs

  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.

Benchmark 2: python3.11 --version
  Time (mean ± σ):       1.8 ms ±   0.3 ms    [User: 1.1 ms, System: 0.6 ms]
  Range (min … max):     1.3 ms …   5.9 ms    1882 runs

  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.

Summary
  python3.11 --version ran
   11.36 ± 2.12 times faster than .rustup/toolchains/nightly-x86_64-unknown-linux-gnu/bin/rustc --version
$ hyperfine --warmup 10 --shell=none ".rustup/toolchains/nightly-x86_64-unknown-linux-gnu/bin/rustc --version" "/usr/bin/python3.11 --version"
Benchmark 1: .rustup/toolchains/nightly-x86_64-unknown-linux-gnu/bin/rustc --version
  Time (mean ± σ):      34.3 ms ±   7.1 ms    [User: 12.1 ms, System: 21.2 ms]
  Range (min … max):    26.2 ms …  64.8 ms    80 runs

Benchmark 2: /usr/bin/python3.11 --version
  Time (mean ± σ):       6.6 ms ±   3.1 ms    [User: 1.5 ms, System: 4.7 ms]
  Range (min … max):     4.1 ms …  64.7 ms    476 runs

  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.

Summary
  /usr/bin/python3.11 --version ran
    5.20 ± 2.65 times faster than .rustup/toolchains/nightly-x86_64-unknown-linux-gnu/bin/rustc --version
bjorn3 commented 8 months ago

For me without strace it takes about 6ms, with strace it takes about 10ms. From the execve call up to prlimit64(0, RLIMIT_STACK, ...) (which is still before the main function executes) takes 9ms. After that is a tiny of of time initializing jemalloc. The time between the rust main function being called and the process exiting is less than 1ms total.

Python is only a 6.6MB executable with basically no dylib dependencies. Rustc on the other hand has 263MB worth of dynamic libraries which it needs to load outside of libc. Even just calling mprotect on the mapped dynamic libraries takes 5ms already.

jyn514 commented 8 months ago

i wonder if it would be possible to dlopen LLVM at runtime so it can be delayed until codegen starts. then only the rustc_driver shared object has to be opened unconditionally (and maybe even that can be dlopen-ed if argument parsing moves to the rustc-main binary?)

bjorn3 commented 8 months ago

We used to dlopen librustc_codegen_llvm.so (to support separate LLVM versions for emscripten and for regular use, no longer necessary as emscripten now uses the upstream wasm backend rather than the asm.js fastcomp backend), but it was merged into librustc_driver.so for perf reasons.

joshtriplett commented 8 months ago

The performance wins that https://github.com/rust-lang/rust/pull/97154 would provide (if we could do that without breaking codegen backends) seem likely to help substantially with this. That might be worth revisiting.

Doineann commented 4 months ago

Maybe related to rustc --version doing more than it is supposed to do? https://github.com/rust-lang/rust/issues/127649

Kobzol commented 4 months ago

Maybe related to rustc --version doing more than it is supposed to do? #127649

That is performed by the rustup wrapper, not by rustc directly, so that is not related to this issue.

Doineann commented 4 months ago

Yeah, I wasn't even really aware of the rustup wrapper behaving as a proxy in the first place.