Closed briansmith closed 1 month ago
I did some comparisons on x86_64-unknown-linux-gnu
and opt-level=3
with a few different implementations:
is_rdrand_good
as #[cold]
: https://rust.godbolt.org/z/MnnavhYWTIt seems like (2) and (3) generate nearly identical code, while (3) is (to me) easier to read. So I think we should just mark is_rdrand_good
(and self_test
) as #[cold]
.
It seems like (2) and (3) generate nearly identical code, while (3) is (to me) easier to read. So I think we should just mark
is_rdrand_good
(andself_test
) as#[cold]
.
First, this change doesn't just affect the RDRAND case. It also affects Linux/Android because it affects the LazyBool
used to cache whether the getrandom
syscall is available.
Secondly, the large effect on the Linux/Android implementation is clearer when use_file
is refactored in the same way. I've updated this PR with that change.
Before this change, the compiler generates code that looks like this:
If the initialization code is small, this works fine. But, for (bad) reasons, is_rdrand_good is particularly huge. Thus, jumping over its inined code is wasteful because it puts bad pressure on the instruction cache.
With this change, the generated code looks like this:
I verified these claims by running:
This is also what other implementations (e.g. OnceCell) do.
While here, I made the analogous change to LazyPtr, and rewrote LazyPtr to the same form as LazyUsize. I didn't check the generated code for LazyPtr though.
(Why is
is_rdrand_good
huge? The compiler unrolls the 10 iteration retry loop, and then it unrolls the 8 iteration self-test loop, so the result isrdrand()
is inlined 80 times inside is_rdrand_good. This is something to address separately as it also affectsgetrandom_inner
itself.)