Open alexcrichton opened 4 years ago
Ok I've confirmed that our intention is to compile LLVM with clang-cl.exe
, but due to bugs in CI configuration that actually isn't happening. I'll look to fix that!
FWIW Clang can also be used for MinGW but it will require few tweaks.
Ok I've done a slightly more scientific test. I spun up two instance on AWS, one Ubuntu and one Windows. They're both using AMD EPYC 7571 cpus, 4 cores (virtualize). Naturally AWS is very noisy, but the hope is to get a baseline measurement between Windows/Ubuntu which at least gets the differences in CPU out of the way.
Again compiling the wast
crate I got ~13s on Ubuntu and ~18s on Windows, again a pretty large discrepancy. That was using rust-lang/rust@50f8aadd746ebc929a752e5ffb133936ee75c52f.
Using a Windows compiler produced from https://github.com/rust-lang/rust/pull/66194 I get ~17s, so while compiling with Clang instead of cl.exe
is a modest improvement, it doesn't explain the remaining 4 ish seconds of compile time difference. The next thing to check is probably ThinLTO because we enable that on Linux, but we don't enable it anywhere else.
@alexcrichton how feasible is it to measure user
(spent doing work) and system
(spent waiting on syscalls) time in seconds for Linux and Windows?
GetProcessTimes
provides both the kernel and user times so it is very feasible to do it on Windows at least.
Maybe shell32 is the reason here (or similar lib). Even thought Rust avoids them at all costs LLVM still links it because it won't build without.
Another potential culprit is Windows Defender scanning the new files as they are produced. Try adding an exclusion for the entire build folder.
The shell32
issue just caused process load times to increase, which is noticeable for short running processes. It would not affect the speed at which LLVM generates code.
@nagisa 98.962% of the time is spent in user mode, so I don't think this is a kernel difference thing. @mati865 as mentioned while possible that's historically only related to startu ptime. @ojeda given that rustc creates very few files, I suspect that is not the issue.
I wonder if the use of jemalloc on Linux but not MSVC could explain some of the difference?
Can someone take a look at this issue? Rust compilation on github actions is significantly longer than linux and macOS.
While it's generally assumed that build systems on Windows are slower than build systems on Linux, I'm seeing a discrepancy of up to nearly 2x differences in compile times per crate on a Windows machine vs a Linux machine. These are personal machines I work on and they're not exactly equivalent machines, but I'm pretty surprised about the 2x differences I'm seeing here and wanted to open an issue to see if we can investigate to get to the bottom of what's going on.
The specifications of the machines I have are:
I don't really know a ton about Intel CPUs, so I'm not actually sure if these are expected where the i9 is 2x faster than the i7. I wanted to write down some details though to see if others have thoughts. All Cargo commands were executed with
-j4
to ensure that neither machine had an unfair parallelism advantage, and also to ideally isolate the effect of hyperthreads.I started out by building https://github.com/cranestation/wasmtime/tree/ab3cd945bc2f4626a2fae8eabf6c7108973ce1a5, and the full
-Ztimings
graph I got was:For the same project and the same compiler commit the Windows build is nearly 70% slower! I don't think that my CPUs have a 70% performance difference between them, and I don't have a perfect test environment for this, but 70% feels like a huge performance discrepancy between Linux and Windows.
Glancing at the slow building crates (use the "min unit time" slider to see them more easily) I'm seeing that almost all crates are 2x slower on Windows than on Linux. This doesn't look like a "chalk it up to windows being slow" issue, but this is where I started thinking that this was more likely to be a bug somewhere in rustc and/or LLVM.
Next up I wanted to try out
-Z self-profile
on a particular crate. One I wrote recently was thewast
crate, which took 13.76s on Linux and 23.05s on Windows. I dug in a bit more building just that crate at https://github.com/alexcrichton/wat/tree/2288911124001d30de0a68e284db9ab010495536/crates/wast.Here sure enough, the command
cargo +nightly build --release -p wast -j4
has a huge discrepancy:Next up I tried
-Z self-profile
and usingmeasurme
I ransummarize diff
and got this output, notably:For whatever reason, it appears that LLVM is massively slower on Windows than it is on Linux.
It was at this point that I decided to write up the issue here and get this all down in a report. I suspect that this is either a build system problem with Windows or it's a compiler problem. We're using Clang on Linux but we're not using Clang on Windows yet, so it may be time to make the transition!