rust-lang / rust

Empowering everyone to build reliable and efficient software.
https://www.rust-lang.org
Other
98.12k stars 12.69k forks source link

MSVC rustc is unnaturally slower than Linux rustc #66192

Open alexcrichton opened 4 years ago

alexcrichton commented 4 years ago

While it's generally assumed that build systems on Windows are slower than build systems on Linux, I'm seeing a discrepancy of up to nearly 2x differences in compile times per crate on a Windows machine vs a Linux machine. These are personal machines I work on and they're not exactly equivalent machines, but I'm pretty surprised about the 2x differences I'm seeing here and wanted to open an issue to see if we can investigate to get to the bottom of what's going on.

The specifications of the machines I have are:

I don't really know a ton about Intel CPUs, so I'm not actually sure if these are expected where the i9 is 2x faster than the i7. I wanted to write down some details though to see if others have thoughts. All Cargo commands were executed with -j4 to ensure that neither machine had an unfair parallelism advantage, and also to ideally isolate the effect of hyperthreads.

I started out by building https://github.com/cranestation/wasmtime/tree/ab3cd945bc2f4626a2fae8eabf6c7108973ce1a5, and the full -Ztimings graph I got was:

For the same project and the same compiler commit the Windows build is nearly 70% slower! I don't think that my CPUs have a 70% performance difference between them, and I don't have a perfect test environment for this, but 70% feels like a huge performance discrepancy between Linux and Windows.

Glancing at the slow building crates (use the "min unit time" slider to see them more easily) I'm seeing that almost all crates are 2x slower on Windows than on Linux. This doesn't look like a "chalk it up to windows being slow" issue, but this is where I started thinking that this was more likely to be a bug somewhere in rustc and/or LLVM.

Next up I wanted to try out -Z self-profile on a particular crate. One I wrote recently was the wast crate, which took 13.76s on Linux and 23.05s on Windows. I dug in a bit more building just that crate at https://github.com/alexcrichton/wat/tree/2288911124001d30de0a68e284db9ab010495536/crates/wast.

Here sure enough, the command cargo +nightly build --release -p wast -j4 has a huge discrepancy:

Next up I tried -Z self-profile and using measurme I ran summarize diff and got this output, notably:

+---------------------------------------------+---------------+------------+------------+--------------+-----------------------+
| Item                                        | Self Time     | Item count | Cache hits | Blocked time | Incremental load time |
+---------------------------------------------+---------------+------------+------------+--------------+-----------------------+
| LLVM_thin_lto_optimize                      | +3.86042516s  | +0         | +0         | +0ns         | +0ns                  |
+---------------------------------------------+---------------+------------+------------+--------------+-----------------------+
| LLVM_module_optimize_module_passes          | +3.152410865s | +0         | +0         | +0ns         | +0ns                  |
+---------------------------------------------+---------------+------------+------------+--------------+-----------------------+
| LLVM_module_codegen_emit_obj                | +1.783877999s | +0         | +0         | +0ns         | +0ns                  |
+---------------------------------------------+---------------+------------+------------+--------------+-----------------------+
| codegen_crate                               | +1.021669947s | +0         | +0         | +0ns         | +0ns                  |
+---------------------------------------------+---------------+------------+------------+--------------+-----------------------+
| LLVM_thin_lto_import                        | +245.950489ms | +0         | +0         | +0ns         | +0ns                  |
+---------------------------------------------+---------------+------------+------------+--------------+-----------------------+
| codegen_module                              | +220.253166ms | +0         | +0         | +0ns         | +0ns                  |
+---------------------------------------------+---------------+------------+------------+--------------+-----------------------+
| LLVM_module_optimize_function_passes        | +134.256719ms | +0         | +0         | +0ns         | +0ns                  |
+---------------------------------------------+---------------+------------+------------+--------------+-----------------------+
| LLVM_module_codegen_make_bitcode            | +111.530996ms | +0         | +0         | +0ns         | +0ns                  |
+---------------------------------------------+---------------+------------+------------+--------------+-----------------------+

For whatever reason, it appears that LLVM is massively slower on Windows than it is on Linux.

It was at this point that I decided to write up the issue here and get this all down in a report. I suspect that this is either a build system problem with Windows or it's a compiler problem. We're using Clang on Linux but we're not using Clang on Windows yet, so it may be time to make the transition!

alexcrichton commented 4 years ago

Ok I've confirmed that our intention is to compile LLVM with clang-cl.exe, but due to bugs in CI configuration that actually isn't happening. I'll look to fix that!

mati865 commented 4 years ago

FWIW Clang can also be used for MinGW but it will require few tweaks.

alexcrichton commented 4 years ago

Ok I've done a slightly more scientific test. I spun up two instance on AWS, one Ubuntu and one Windows. They're both using AMD EPYC 7571 cpus, 4 cores (virtualize). Naturally AWS is very noisy, but the hope is to get a baseline measurement between Windows/Ubuntu which at least gets the differences in CPU out of the way.

Again compiling the wast crate I got ~13s on Ubuntu and ~18s on Windows, again a pretty large discrepancy. That was using rust-lang/rust@50f8aadd746ebc929a752e5ffb133936ee75c52f.

Using a Windows compiler produced from https://github.com/rust-lang/rust/pull/66194 I get ~17s, so while compiling with Clang instead of cl.exe is a modest improvement, it doesn't explain the remaining 4 ish seconds of compile time difference. The next thing to check is probably ThinLTO because we enable that on Linux, but we don't enable it anywhere else.

nagisa commented 4 years ago

@alexcrichton how feasible is it to measure user (spent doing work) and system (spent waiting on syscalls) time in seconds for Linux and Windows?

retep998 commented 4 years ago

GetProcessTimes provides both the kernel and user times so it is very feasible to do it on Windows at least.

mati865 commented 4 years ago

Maybe shell32 is the reason here (or similar lib). Even thought Rust avoids them at all costs LLVM still links it because it won't build without.

ojeda commented 4 years ago

Another potential culprit is Windows Defender scanning the new files as they are produced. Try adding an exclusion for the entire build folder.

retep998 commented 4 years ago

The shell32 issue just caused process load times to increase, which is noticeable for short running processes. It would not affect the speed at which LLVM generates code.

alexcrichton commented 4 years ago

@nagisa 98.962% of the time is spent in user mode, so I don't think this is a kernel difference thing. @mati865 as mentioned while possible that's historically only related to startu ptime. @ojeda given that rustc creates very few files, I suspect that is not the issue.

ollie27 commented 4 years ago

I wonder if the use of jemalloc on Linux but not MSVC could explain some of the difference?

elibroftw commented 11 months ago

Can someone take a look at this issue? Rust compilation on github actions is significantly longer than linux and macOS.