rust-random / rand

A Rust library for random number generation.
https://crates.io/crates/rand
Other
1.67k stars 432 forks source link

JitterRng security #699

Closed tarcieri closed 5 years ago

tarcieri commented 5 years ago

Splitting this discussion off from #681, I'd like to ask the question: is JitterRng secure enough to be considered a CryptoRng?

https://github.com/rust-random/rand/issues/681#issuecomment-455088763

JitterRng uses the high-resolution part of nano-second timers. As far as I can tell, JavaScript cannot even access nanosecond timers in the browser.

JavaScript is just one example of a way an attacker could either influence the TRNG, or establish a covert channel to read its state. Cotenant VMs are another example. Even without a high precision timer, an attacker can still attempt to actively influence the RNG's output, and if they can establish any sort of signal that the attack is working, or starting to work, even if statistical, they can use that as part of a bidirectional feedback loop to tune the attack.

I read the "paper" on the design JitterRng is supposed to be based on, jitterentropy-library:

http://www.chronox.de/jent/doc/CPU-Jitter-NPTRNG.pdf

Sidechannel attacks aren't discussed whatsoever, and don't seem to be considered. Quite the opposite. From section 7.1:

Direct readout of random number or the internal state of the CPU Jitter random number generator: This approach can be immediately refuted as the random number generator relies on the process separation and memory isolation offered by contemporary operating systems.

There have been a number of recent attacks which allow data to be exfiltrated across process boundaries via covert channels, namely Meltdown, Spectre, and Foreshadow. Variations of these attacks are being discovered on a frequent basis... the newest attack sidechannel attack on Intel CPUs, a sidechannel based on Integrated Performance Primitives, was just announced a few days ago.

It also non-chalantly cites HAVEGEd as related work, when it is somewhat notorious for failing to live up to its claims and being a potential source of poor random numbers, and is implicated in a PolarSSL CVE:

https://lwn.net/Articles/525459/ https://tls.mbed.org/tech-updates/security-advisories/polarssl-security-advisory-2011-02

Section 7.1.1 describes the general attack I had been discussing earlier in #681, but it makes a number of assumptions I don't think hold:

Comparing the attack process readings with a fully unobserved process indicates that the attacking process can never determine the victim’s time stamps more accurate than the CPU execution time jitter our random number generator is based on. An attacking process is never be able to reduce the variations of the CPU execution time jitter.

The analysis in this section is rather handwavy and assumes quite a bit about how an attack would work as if it were the only way. I see a real risks in how many of the inputs to JitterRng an attacker can potentially influence, and the potential number of ways the attacker could leverage minute statistical signals to tune these attacks.

Unfortunately, most papers on "TRNGs" I'm reading on ePrint describe purpose designed hardware RNGs, and though I have seen the term "TRNG" used for HAVEGEd-style entropy collectors in the past, I'm really beginning to wonder if this is a misnomer.

Regardless, here's a paper that puts forth a much more rigorous framework for evaluating TRNGs:

https://eprint.iacr.org/2009/299.pdf

And here's a paper that demonstrates a number of different attacks against hardware TRNGs, some of which are remotely exploitable:

http://eprints.whiterose.ac.uk/117858/7/micpro_IoT.pdf

Some of these attacks leverage the statistical techniques I was alluding to / guessing about earlier, and could potentially be adapted to something like JitterRng.

Your first article on GAROs sounds like it is observational rather than a definitive proof?

I think the burden of proof for security is ultimately on the RNG implementation. If I can't conceive of a practical attack, that doesn't make something secure, it means we don't know.

For a positive result, we'd need to prove or otherwise demonstrate properties of JitterRng.

Based on reading a few papers tonight, I think the state of affairs for these sorts of (pseudo-)TRNGs is in fact significantly worse than I had realized. If anyone has done a rigorous analysis of HAVEGEd-style software-based TRNGs, as opposed to hardware based ones, I've been unable to find it. The author of this library did not do such an analysis, and the quality of his paper is poor. Furthermore it does not seem to acknowledge or learn from past failures of HAVEGEd.

The security of these sorts of RNGs seems to rest entirely on an assumption: that modern CPUs are too noisy to be predicted, even when an attacker is able to directly influence their behavior via code running on the same CPU. I do not think this is a sound assumption.

I would consider the library JitterRng is based on to be of questionable provenance, not learning from the mistakes of its predecessors, and fundamentally trying to do something I consider to be very scary: generate random numbers out of potentially attacker-controlled / influenced values, and also avoid leaking them through microarchitectural sidechannels.

The main way we'd avoid the latter is by using a constant-time implementation which avoids secret-dependent branching, however branching on values JitterRng intends to use as "random" secrets seems to be integral to the way it functions:

https://github.com/rust-random/rand/blob/a7c2eae35e7d547bbe3b1ba766d9351f0c97eae8/src/rngs/jitter.rs#L634-L675

tl;dr: JitterRng seems rather scary to me, and I personally do not consider it of high enough quality to be considered a CryptoRng, and would like to ensure it is not accidentally used in place of an OS-provided RNG in any of my cryptography projects consuming rand_os.

dhardy commented 5 years ago

Thanks for splitting this out into a new issue.

At this point I can only agree that we should be cautious about considering JitterRng secure. Since it is only used as a backup entropy source and in practice should almost never be used, I don't think there is immediate cause for alarm; however, we should migrate away from using it.

I apologise for making you go to so much effort to convince me of this generator's weaknesses, but at least now we can make a clean decision for the whole project (i.e. never use this and possibly use RDRAND if a backup source is needed, rather than merely not include it in rand_os).

newpavlov commented 5 years ago

Don't forget that RDRAND is not available on all x86-64 CPUs (not even mentioning ARM) and using it will require bumping MSRV to 1.27. Also there is a certain controversy exists around RDRAND, so I am not sure if having it as an enabled-by-default fallback will be a good decision.

dhardy commented 5 years ago

I'm not forgetting that:

In theory we could combine entropy from multiple sources redundantly, but I don't see much point.

tarcieri commented 5 years ago

Yeah, the RDRAND situation in particular has a tension between the following:

Personally for me the latter is outside of my threat model. Also falling back to RDRAND in the event of a kernel entropy failure is systemd's current behavior.

In earlier discussions it sounded like JitterRng was used in some particular cases that couldn't be covered on rand_os on Windows. What are the present gaps? Can RDRAND fill them most of the time? Would it make sense for JitterRng to be something users of these particular platforms pull in specifically to fill that gap?

dhardy commented 5 years ago

Well, that was a fun read. Shall we all admit that everything is insecure and move on?

More practically, I still don't see a big reason to be petrified of JitterRng but agree that we should aim to replace it. The "case that couldn't be covered" real but somewhat obscure and the current solution is miles better than the original fallback. Yes, I think we should just be able to use RDRAND where the default system generator fails (it doesn't cover all platforms, but likely the problematic ones, unless maybe Windows RT has issues and there are a non-zero number of users who care).

tarcieri commented 5 years ago

If JitterRng were in its own crate, and rand_os gated the inclusion of the JitterRng crate with something like [target.'cfg(target_os=...)] in its Cargo.toml for the weird edge case target, I guess I’m not too worried.

newpavlov commented 5 years ago

I was thinking about making rand_jitter an optional dependency for rand_os, with fallback gated by disabled-by-default feature, with a note that library crates should not enable this feature and leave it to applications. We could do the same for rdrand as well.

stouset commented 5 years ago

@dhardy If we can't assert with extremely high confidence that our RNG is going to emit unpredictable bits, correct behavior is not to produce probably-unpredictable bits but to emit an error. This is one of those cases where falling back to just doing the best we can is not acceptable behavior from a security perspective.

dhardy commented 5 years ago

@stouset it depends on the context, because not providing a second option sometimes results in people implementing their own very insecure alternative.

Perhaps what we need here are multiple types of RNG, though every time I have brought up splitting thread_rng into secure and insecure versions, people have heavily criticised the idea (with good reason, in part simply that it's unnecessary).

So I still believe the plan outlined above is the best option: replace JitterRng with RDRAND.

But @stouset you should also note that OsRng never made use of JitterRng or other clock-based generators; this was only used in higher-level abstractions like EntropyRng and thread_rng.

dhardy commented 5 years ago

And to respond to @newpavlov's comment above, I strongly disagree:

  1. I don't believe we should make JitterRng even an optional dependency of OsRng. We did talk about putting EntropyRng in the same crate, but we've also talked about deprecating that, and with RDRAND instead I don't believe it should be necessary any more.
  2. I would like to make the new getrandom crate independent of RngCore, and rand_os only a thin wrapper around getrandom.
newpavlov commented 5 years ago

I thought about making rand_os a thin wrapper around getrandom and include disabled-by-default fallbacks to RDRAND and JitterRng. So users will be able to enable either one of them or both. (I guess we can prioritize RDRAND over JitterRng) My problem with making fallback exclusively based only on RDRAND is that it's too target specific to my liking, and I would prefer to give users a choice here. Imagine for example Firefox running on older x86 CPUs without RDRAND.

dhardy commented 5 years ago

That's a tricky question, though in practice I think supporting no_std somehow will be far more important, and may lead to an alternative solution.

Anyway, I thought I'd get the ball rolling: https://github.com/rust-random/getrandom

dhardy commented 5 years ago

Update: JitterRng now lives in its own crate (rand_jitter) and is no longer a dependency of any other Rand crate (not even optionally) in the latest master.

This discussion has served its purpose and should be closed now, but before it gets forgotten this discussion should probably be linked into the rand_jitter documentation somehow.

burdges commented 5 years ago

Anyone looked into whether ideas around JitterRng work in WASM?

dhardy commented 5 years ago

IIRC WASM deliberately does not have high-resolution time-stamps, so implementing a jitter-based RNG is impossible.

dhardy commented 5 years ago

FYI, it looks like Linux 5.4 will include a jitter-based entropy collector as a fall-back. Better there (highly peer-reviewed, kernel code) than here I guess.

https://www.phoronix.com/scan.php?page=news_item&px=Linux-5.4-Actively-Gen-Entropy