nagisa / rust_rdrand

An implementation of random number generator based on rdrand instruction
https://docs.rs/rdrand/
ISC License
25 stars 9 forks source link

Output is (too) biased #16

Open briansmith opened 3 years ago

briansmith commented 3 years ago

Please see https://github.com/rust-random/getrandom/issues/228. I'd rather not going to repeat all the details here because I'd like to keep the conversation in one place. However, I also want you to be aware of this concern. Maybe I'm overlooking something.

nagisa commented 3 years ago

Yeah, this is unfortunate. My memory of this issue is that there's no good way to use CPU detection to filter out the CPUs that do the wrong thing (e.g. there were some bulldozers marketed as Ryzen 1000 series), but nowadays, I believe just considering anything before zen2 be broken would cover everything.

For systemd the x != 0 as a solution makes most sense mostly because it cares more about not blocking the boot process than it is about bias in the randomness. Everything else just copied that.

nagisa commented 3 years ago

Fixed in https://github.com/nagisa/rust_rdrand/commit/5ef19218ce5bd216fbb444f36624f9c6e192c2e1 I believe.

briansmith commented 3 years ago

Some comments on the fix:

nagisa commented 3 years ago

The comment says "On AMD processor families < 0x17." I don't have enough direct information on the matter but I think BoringSSL excludes some 0x17 CPUs

As my previous comment indicates, I remember some Ryzen CPUs being affected, but those were not actually Zen, but rather an older family (16h). 17h model 0x7?s are Matisse which led me to recall https://www.phoronix.com/scan.php?page=news_item&px=AMD-Releases-Linux-Zen2-Fix and then a much more widely covered “Destiny 2 does not work” problem.

I do understand why BoringSSL would decide to filter out these CPUs given this information, but its pretty disappointing that there is no more conclusive information. e.g. I'm not entirely sure if it wasn't just rdrand always reporting a failure to generate a random number (and the game/systemd trying to infinitely generate one regardless). At the very least a lack of a linux kernel workaround for this problem would indicate to me that it isn't serious at all… Not quite sure what the best way forward here is.


My reading of the other systemd issue (https://github.com/systemd/systemd/issues/18184) is that the issue is not actually there except for a motherboard with a beta release of firmware for it. I'm not sure how feasible it is for us to meaningfully safeguard against users using non-production firmware for security sensitive workloads. After all, anybody can pick open a microcode blob and do something weird to the rdrand related code therein if they really wanted to.


There is some duplicate code that seems unfortunate, e.g. the duplicate blocks for filtering out bad AMD processors. This makes the change harder to read than if the duplicate logic were factored out in some way.

I'll think about a better approach here.


Since the fix wasn't done using the GitHub PR workflow, it's difficult to give line-by-line commentary on the changes. (I used to make changes in a similar way and eventually I switched to the GitHub PR workflow for even my own changes, and I think it actually improved my workflow, although it seemed like a waste of time when I started.)

Point taken.