Closed Elabajaba closed 1 year ago
That is quite a bad result. I can't reproduce it locally on my Linux laptop (2 core + HT, 4GB ram) where it finished everything in 130s (after substracting time to update the registry and fetching crates) during which all cores were fully loaded the entire time. Wgpu took only 43s and naga 46s. I will need some more information to be able to pin point the issue. Could you please run cargo clean
and then RUSTFLAGS="-Zself-profile=/path/to/your/project/self_profile_results" dist/cargo-clif build --release
and upload the files starting with wgpu-
and naga-
in the self_profile_results
directory in your project generated by the previous command? The self-profile output shows on what exactly rustc (and by extension cg_clif) spent how much time. Note that you won't be able to upload the whole self_profile_results directory as it is over 1GB which is way past the github upload limit. For me wgpu and naga combined were only 50MB (or 100MB when including wgpu_core, wgpu_hal and wgpu_types too).
(you mean -Zself-profile
rather than -Ztime-passes
?)
Yeah, fixed. Thanks!
I had to split it into 2 zip files since github has a 25MB limit (and doesn't support anything better than .zip). self_profile_results-1.zip (naga+wgpu core) self_profile_results-2.zip (wgpu hal, wgpu types, and wgpu)
For wgpu there is a codegen unit (on thread 5) taking 44s to compile spread out over a dozen functions and another codegen unit (on thread 16) taking 2.2 min where compiling a single function completely dwarfs all other functions.
For naga there is a single function taking 1.8 min (on thread 4)
It would really help to know which functions are so slow to compile. I work on some code to show function names in the self profile results tomorrow.
(Recording timings for the various passes of cranelift using generic_activity_with_arg_recorder would also be very useful in this case.)
(Recording timings for the various passes of cranelift using generic_activity_with_arg_recorder would also be very useful in this case.)
Hopefully I did it right. It was quite a bit bigger so I had to split them up even more this time. naga-0023456.mm_profdata.zip wgpu_core-0011120.mm_profdata.zip wgpu_hal_types.zip wgpu-0005296.mm_profdata.zip
In case it helps, I was using the msvc artifact downloaded from https://github.com/bjorn3/rustc_codegen_cranelift/actions/runs/4065395449
I just tested on Linux (same computer, Fedora 37) and it's similar but less bad (using the linux artifact from the same actions run). In total it took 89.7s to run cargo-clif build --example many_foxes --release
, with wgpu taking 71.0s (69.9s codegen) and naga taking 66.3s (63.4s codegen).
In case it helps, I was using the msvc artifact downloaded from https://github.com/bjorn3/rustc_codegen_cranelift/actions/runs/4065395449
I just built my own from master (21bdff8bc015b557bfae85b968dc4c100be85caf, ./y.rs build
with no options, except -Ctarget-cpu=native
in RUSTFLAGS), and it doesn't have this issue. (cargo-clif build --example many_foxes --release
only took 24.5s)
In case it helps, I was using the msvc artifact downloaded from https://github.com/bjorn3/rustc_codegen_cranelift/actions/runs/4065395449
Did my own build on Windows MSVC using the master branch, and it's also fine? (apart from linking being extremely slow if you don't use lld)
I also tried downloading a different MSVC CI artifact, and it also hangs for a similar amount of time.
I can also confirm, that I tried to reproduce this yesterday with a local build and didn't experience any major slowdowns.
Except for the linker thing, which should be fixed once https://github.com/bytecodealliance/wasmtime/pull/5550 is merged.
I'll try to reproduce it with CI artifacts.
Edit: Unfortunately I wasn't able to reproduce it with the CI artifacts which is weird. Here's my cargo timing report
(Recording timings for the various passes of cranelift using generic_activity_with_arg_recorder would also be very useful in this case.)
Hopefully I did it right. It was quite a bit bigger so I had to split them up even more this time. naga-0023456.mm_profdata.zip wgpu_core-0011120.mm_profdata.zip wgpu_hal_types.zip wgpu-0005296.mm_profdata.zip
Unfortunately these files seem to be corrupt. Crox produces a near empty json file. Summarize doesn't show anything.
https://github.com/bjorn3/rustc_codegen_cranelift/pull/1346 added more details to the self profile output. Could you try RUSTFLAGS="-Zself-profile=/path/to/your/project/self_profile_results -Zself-profile-events=default,function-args" dist/cargo-clif build --release
with the latest CI build? (https://github.com/bjorn3/rustc_codegen_cranelift/actions/runs/4086944682)
Well, those flags added a ton to the build times (7m25s instead of ~3m10s).
Thanks! I found the issue. To catch more bugs in CI, I made the build system enable debug assertions for cg_clif itself when building on CI. One of the effects of this is enabling the regalloc verifier, which can be very slow. Because debug assertions are enabled when building in CI, the CI artifacts have debug assertions and thus the regalloc verifier enabled, but local builds don't, hence why you could only reproduce this issue with a cg_clif from CI and not with one built by yourself. I think I will need to create separate dist jobs for CI artifacts that don't enable debug assertions.
Fixed in #1347
Hi, I was testing this out to see if it would work for building Bevy on Windows and debug builds worked reasonably well (a bit faster build time for a bit slower runtime on Windows+MSVC).
Then I tried a release build, to see if it was possible to get better runtime performance at similar build times to regular llvm rustc debug builds, and it had some issues. It ended up taking 192.4s (on a 5900x with 32GB RAM), during which it spent 165.8s building wgpu, with 164.2s of that being codegen, and 119.8s building naga (115.1s of which was codegen).
I confirmed that this wasn't some weird issue due to building Bevy by cloning and building the naga repo, and it took a similar amount of time.