rust-random / getrandom

A small cross-platform library for retrieving random data from (operating) system source
Apache License 2.0
264 stars 166 forks source link

Help the compiler avoid inlining lazy init functions. #443

Closed briansmith closed 1 month ago

briansmith commented 1 month ago

Before this change, the compiler generates code that looks like this:

  if not initialized:
     goto init
do_work:
  do the actual work
  goto exit
init:
  inilned init()
  goto do_work
exit:
  ret

If the initialization code is small, this works fine. But, for (bad) reasons, is_rdrand_good is particularly huge. Thus, jumping over its inined code is wasteful because it puts bad pressure on the instruction cache.

With this change, the generated code looks like this:

  if not initialized:
     goto init
do_work:
  do the actual work
  goto exit
init:
  call init()
  goto do_work
exit:
  ret

I verified these claims by running:

$ cargo asm --rust getrandom_inner --lib --target=x86_64-fortanix-unknown-sgx

This is also what other implementations (e.g. OnceCell) do.

While here, I made the analogous change to LazyPtr, and rewrote LazyPtr to the same form as LazyUsize. I didn't check the generated code for LazyPtr though.

(Why is is_rdrand_good huge? The compiler unrolls the 10 iteration retry loop, and then it unrolls the 8 iteration self-test loop, so the result is rdrand() is inlined 80 times inside is_rdrand_good. This is something to address separately as it also affects getrandom_inner itself.)

josephlr commented 1 month ago

I did some comparisons on x86_64-unknown-linux-gnu and opt-level=3 with a few different implementations:

  1. Current implementation: https://rust.godbolt.org/z/3nWjdbW7j
  2. The implementation in this PR: https://rust.godbolt.org/z/384jec4ba
  3. Marking is_rdrand_good as #[cold]: https://rust.godbolt.org/z/MnnavhYWT

It seems like (2) and (3) generate nearly identical code, while (3) is (to me) easier to read. So I think we should just mark is_rdrand_good (and self_test) as #[cold].

briansmith commented 1 month ago

It seems like (2) and (3) generate nearly identical code, while (3) is (to me) easier to read. So I think we should just mark is_rdrand_good (and self_test) as #[cold].

First, this change doesn't just affect the RDRAND case. It also affects Linux/Android because it affects the LazyBool used to cache whether the getrandom syscall is available.

Secondly, the large effect on the Linux/Android implementation is clearer when use_file is refactored in the same way. I've updated this PR with that change.