Optimize CSPRNG a bit - Githubissues

tuxxy commented 5 years ago

This PR avoids using the Python SystemRandom API for the uniform_bool method.

Instead, it reads a single byte from /dev/urandom directly and then performs a bitwise right-shift to trim excess bits. I've experienced speedups of about ~2.7x

ValarDragon commented 5 years ago

If random number generation is a bottleneck, it would probably be worth saving the extra 7 bits of randomness, use them in subsequent calls, and once it runs out, then generate another byte, etc.

tuxxy commented 5 years ago

The bottleneck comes from numpy and kernel overhead mostly at this point.

I think what you're proposing may bias the RNG on first thought. Ideally, perhaps this can call out via cffi.

On Fri, Jan 4, 2019, 21:27 Dev Ojha <notifications@github.com wrote:

If random number generation is a bottleneck, it would probably be worth saving the extra 7 bits of randomness, use them in subsequent calls, and once it runs out, then generate another byte, etc.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/nucypher/nufhe/pull/9#issuecomment-451626750, or mute the thread https://github.com/notifications/unsubscribe-auth/ADbn7tNzgvJ29A0DHEGsPqPRAlrTbqOmks5vACm1gaJpZM4Zusx9 .

ValarDragon commented 5 years ago

If uniform_bool is still on the critical path after this change, then this should help reduce the critical path.

AFAICT it only weakens the RNG if you consider side-channel adversaries. If you assume that urandom(1) returns a single uniformly random byte, then it follows that each bit is uniformly distributed. What I'm proposing is then just keeping track of the number of random bits you've obtained, and using that information to use all the bits of the obtained byte.

Here is what I think it would look like in python (abstracted to pulling more than 1 byte from urandom at a time)

class SecureRNG:
    def __init__(self):
        self.rng = random.SystemRandom()
        self.remaining_rand_bits = 0
        self.urandom_read_size = 1

    def get_rand_bit(self):
        if self.remaining_rand_bits == 0:
            self.remaining_rand = int.from_bytes(urandom(self.urandom_read_size), 'big')
            self.remaining_rand_bits = 8*self.urandom_read_size
        self.remaining_rand_bits -= 1
        bit = self.remaining_rand & 1
        self.remaining_rand = self.remaining_rand >> 1

    def uniform_bool(self, shape):
        length = numpy.prod(shape)
        return numpy.array([self.get_rand_bit() for _ in range(length)], Int32).reshape(shape)

(I didn't actually compile the above, there may be minor mistakes)

tuxxy commented 5 years ago

I'll play around with this, but I think I prefer the simpler design (for now0 instead of having to worry about maintaining a pool. Also, the concerns with side channels are interesting wrt this design.

fjarri commented 5 years ago

@ValarDragon , actually the critical parts are uniform_torus32() and gauss(). They're the ones that get called with huge shapes (millions to tens of millions of elements).

But let's follow the path of incremental improvement. Merging this for now.

nucypher / nufhe

Optimize CSPRNG a bit #9