Closed nnethercote closed 6 months ago
I've opened an issue for the LLVM part upstream: https://bugs.llvm.org/show_bug.cgi?id=37588
Thank you, @nikic! That's extremely helpful.
This seems to benefit a lot from ~#57315~ #57351
A non-incremental build of rustc with LTO enabled takes 20s for a debug build, while a rustc with the mentioned PR that was built in incremental mode takes "only" 9.4s. Building a LTO'd version of that rustc now...
LTO'd rustc with ~#57315~ #57351 takes 3.1s for me to produce a debug build.
@dotdash Are you sure that's the right PR? It's a Cargo update.
oops, typo'd, it's #57351
Thinking about it, given that the test case originally only took 4.5s, I suspect that the const_eval part might have been a regression that was introduced in the meantime.
Either my machine is that much of a beast, or this was simply fixed.
$ time rustc helloworld5000.rs
real 0m0.423s
user 0m0.384s
sys 0m0.039s
$ time rustc -Copt-level=3 helloworld5000.rs
real 0m1.724s
user 0m1.579s
sys 0m0.144s
helloworld5000
is the name I've given to the benchmark that ishelloworld
with theprintln!("Hello world");
repeated 5,000 times. It's an interesting stress test for the compiler.On my machine, a debug build takes 4.5 seconds and an opt build takes 62(!) seconds.
In the debug build, execution time is dominated by
take_and_reset_data
. Cachegrind measures these instruction counts:The
reset_unifications
call withintake_and_reset_data
is the expensive part. It all boils down toset_all
within theena
crate:and iterator code (called from
set_all
):I did some measurement and found that, in the vast majority of cases,
reset_unification
is a no-op -- it overwrites the the unification table with the same values that it already has. I wonder if we could do better somehow. It's a shame we have to keep the unbound variables around rather than just clearing them like we do with the other data intake_and_reset_data
. I know that this is an extreme example, but profiles indicate thatreset_unifications
is somewhat hot on more normal programs too. @nikomatsakis, any ideas?In the opt builds, these are the top functions according to Cachegrind:
That's a lot of time in
PointerMayBeCaptured
.