rust-random / rand

A Rust library for random number generation.
https://crates.io/crates/rand
Other
1.68k stars 430 forks source link

Key erasure AES variant #299

Open vks opened 6 years ago

vks commented 6 years ago

I ported an implementation of a fast-key-erasure RNG suggested by djb to Rust: https://github.com/vks/aesrng

It is about two times faster than non-SIMD xoroshiro when generating 100 MiB of random data. Only fill_bytes is implemented for now.

Once stdsimd is stabilized, this might be of interest for Rand.

dhardy commented 6 years ago

Your point is that it's possible to produce very fast cryptographic generators for bulk output by using CPU-specific features? Cool. But I don't much like comparing apples to pineapples.

I suggest this would make a good external crate, but not be such a good inclusion with Rand itself since:

To be fair, we should probably consider SIMD-optimised generators at some point, though I don't know whether it would be worth making them the default generators (for many applications where randomness is consumed in small quantities the performance enhancements may not be significant).

nagisa commented 6 years ago

To be fair, we should probably consider SIMD-optimised generators at some point, though I don't know whether it would be worth making them the default generators (for many applications where randomness is consumed in small quantities the performance enhancements may not be significant).

Could always pick a SIMD specialisation on when user calls a fill_buffer with a larger buffer, and otherwise use the non-SIMD implementation with less throughput but possibly better latency characteristics.

dhardy commented 6 years ago

Yes, but SIMD probably makes more sense with buffer-backed RNGs like the current cryptographic ones (generate a block at once).

vks commented 6 years ago

@dhardy

The speed comes from using SIMD instructions to produce large blocks of data. Many users of RNGs only want small amounts of random data at a time, so this is not a good general purpose RNG.

Why? Our current StdRng is buffered as well, via BlockRng. If I use a similar buffer (128 bytes), I get slightly worse performance than StdRng on master (5.1 ns instead of 3.55 ns per u64). Initialization is a lot cheaper (31 ns instead of 7.6 us) and the key is erased every 128 bytes.

Fast key erasure is likely mainly of interest to specific users with very high security requirements? It appears a "sensible precaution" but one with high cost and little expected benefit;

It gives forward secrecy. This is a nice property to have, because it means that i.e. compromising a server does not compromise previously generated secrets (if they are not stored in some other form). In that regard it is similar to ReseedingRng, but more reliable. I'm not sure what you mean with high cost? The total cost is comparable to StdRng. Removing key erasure improves next_u64 performance by about 40 % (3.08 ns, making it faster than StdRng).

CPU-specific instructions mean your code only works on modern x86_64 CPUs; Rand is designed to be portable.

Rand can still be portable by falling back to something else on other platforms.

Low-level instructions are hard to review, so it would be better in its own crate so that individual projects can choose whether or not to use it IMO.

That is a fair point, it contains a bit of unsafe code. I hope this can be improved by using a safe abstraction like faster and maybe aesni.

pitdicker commented 6 years ago

@vks Nice project!

I tried to read up a bit on the design of this RNG, but I am not there yet :smile:. Is my understanding right?

The basic idea is that common RNGs for cryptography keep some of their generated values in memory, and this memory can be leaked (swapping to disk, inspecting process memory, etc.). This RNG by design has to overwrite a generated value when making the next one, so it is secure by default. It is very fast because it uses a hardware implementation of AES, and because it does not follow the clunky implementation requirements of NIST.

I sure think it is neat. It may be good to wait a while before really using it to give it time to prove itself. And also good to take lessons from. One thing I had planned for the BlockRng wrapper was to erase used values.

But I see no reason why this RNG has to be in Rand, at least not right now. We are moving towards no, or as few as possible, RNG implementations, not many.

vks commented 6 years ago

This is certainly not something for the near future (the algorithm, the implementation and SIMD in Rust are quite immature). I was just wondering whether it might be a better default in the long run, and the point of this issue is to discuss the pros and cons.

Is my understanding right?

Yes, I think it is.

pitdicker commented 6 years ago

Renamed the issue from "Faster RNG for filling buffers" to "Key erasure AES variant", as this is more about a new CSPRNG than about filling buffers.

pitdicker commented 6 years ago

This article offers an improvement over using AES in counter mode, in that it offers backtracking resistance at lower cost or a better API.

If at some point the internal state of the common variant is exposed, all previously generated values can be recovered. The NIST document has some requirements that the common variant should overwrite its state when the user is 'done with it', but that is not easy for an API.

This variant will, after generating a buffer with a couple of blocks of values, also use one block to generate a new internal state (or seed). This overwrites the old key, making it impossible to recover previous rounds.

dhardy commented 6 years ago

Another fast backtracking-resistant RNG: https://github.com/google/randen

vks commented 6 years ago

Note that Randen provides a Rust implementation in the official repository.

dhardy commented 6 years ago

The Randen paper has been published: https://arxiv.org/abs/1810.02227

dhardy commented 5 years ago

We shouldn't forget about Randen as a possible alternative to ChaCha; it is fast, fairly simple code and has backtracking resistance. Right now I see two issues:

  1. Limited portability: it relies on a CPU-native AES instruction for performance, thus we should keep an alternative available for some platforms (like we do with HC-128 now).
  2. Trust: while the code looks reasonably straightforward and is over a year old, the algorithm is still fairly new in academic terms, apparently with little review.

This would provide an alternative to #872, for some platforms.