paragonie / random_compat

PHP 5.x support for random_bytes() and random_int()
https://paragonie.com/projects
MIT License
8.17k stars 145 forks source link

Pure PHP PRNG? #61

Closed cweagans closed 9 years ago

cweagans commented 9 years ago

Both Drupal and phpseclib try to get random bytes from a good source, and then if they can't, they fall back to a pure PHP PRNG.

The Drupal implementation looks like this:

      // If we couldn't get enough entropy, this simple hash-based PRNG will
      // generate a good set of pseudo-random bytes on any system.
      // Note that it may be important that our $random_state is passed
      // through hash() prior to being rolled into $output, that the two hash()
      // invocations are different, and that the extra input into the first one -
      // the microtime() - is prepended rather than appended. This is to avoid
      // directly leaking $random_state via the $output stream, which could
      // allow for trivial prediction of further "random" numbers.
      if (strlen($bytes) < $count) {
        // Initialize on the first call. The contents of $_SERVER includes a mix
        // of user-specific and system information that varies a little with
        // each page.
        if (!isset($random_state)) {
          $random_state = print_r($_SERVER, TRUE);
          if (function_exists('getmypid')) {
            // Further initialize with the somewhat random PHP process ID.
            $random_state .= getmypid();
          }
          $bytes = '';
        }

        do {
          $random_state = hash('sha256', microtime() . mt_rand() . $random_state);
          $bytes .= hash('sha256', mt_rand() . $random_state, TRUE);
        } while (strlen($bytes) < $count);

The phpseclib implementation is here: https://github.com/phpseclib/phpseclib/blob/master/phpseclib/Crypt/Random.php#L116-L241 (too long to paste directly on this issue, and has a lot of dependencies on other parts of phpseclib).

My question is:

  1. Should random_compat provide a pure PHP PRNG for people that don't have any reasonable randomness sources?; and if so:
  2. Is there some existing implementation that would make sense to pull into random_compat
narfbg commented 9 years ago

No, and no.

And that piece of code from Drupal ... there's nothing random in it.

cweagans commented 9 years ago

Let me rephrase my question: given that at least two major possible consumers depend on a pure PHP implementation (which I know is not ideal), could we provide a "least-horrible" version of a PRNG written in PHP for those frameworks that have to work on the lowest common denominator of hosts (i.e. godaddy and similar ilk)?

I'm not saying that it's a good thing to do this, but I think it's reasonable to centralize the awfulness and try to make it less bad. Then these other PHP projects can literally just include the polyfill and call random_bytes() and not have to think up some weird contrived thing to get "random" data.

Maybe we could add it as a possible choice and emit a warning or something when it's used?

cweagans commented 9 years ago

Or even better - if there's some other project that provides this, this lib can just check to see if it's available and use it. That way, this library isn't tainted by a bad decision, but frameworks that need it can opt-in. Would that be amicable?

paragonie-scott commented 9 years ago

No, I will not create a userspace PRNG for when there is no secure way to obtain random data from the operating system.

Et cetera. cc @tqbf and @ircmaxell

oittaa commented 9 years ago

No point in giving false sense of security. It's better to just throw an exception, when there aren't any reliable random sources available.

scottchiefbaker commented 9 years ago

I respectfully disagree. While a userspace PRNG is not optimal, I don't think we should completely nix the idea. Is a userspace PRNG any worse than a user using mt_rand()? I don't think should we should put it in the default "pool" of entropy sources, but put it behind a flag like $use_local_prng = true (with a warning in the code not use it for sensitive data) or something similar.

There are cases where you may not have a good entropy source, but you still want access to random_int() and random_bytes()

AshleyPinner commented 9 years ago

If you do not have a good entropy source, you do not have random numbers. If you do not have a good entropy source, random_int/random_bytes will not work for you either, even on PHP7. There is no useful information that you can access (outside of what this lib tries to use) that can give you anything approching a CSPRNG.

Please remember, this is a CSPRNG source, not a PRNG source. The CS here is Cryptographically Secure, and userland PRNGs are not.

paragonie-scott commented 9 years ago

There are cases where you may not have a good entropy source, but you still want access to random_int() and random_bytes()

These are CSPRNG functions, not mere PRNG functions. If you use it, it comes with the implied guarantee that they're cryptographically secure.

Whether or not providing our own userland PRNG is any worse than the developer saying, "Y'know what fuck it I'll use mt_rand() #yolocrypto 4 life" doesn't change the fact that falling back to an insecure PRNG is a betrayal of the purpose of this library.

If you want to use these functions and you do not have a good entropy source, your setup is broken. Please fix it, don't break this library by making it fail open for everyone else.

GrahamCampbell commented 9 years ago

If you want to use these functions and you do not have a good entropy source, your setup is broken.

Exactly.

paragonie-scott commented 9 years ago

That way, this library isn't tainted by a bad decision, but frameworks that need it can opt-in. Would that be amicable?

These sorts of decisions are actually built into the library through our use of Exceptions.

function yolo_random_bytes($numBytes)
{
    try {
        return random_bytes($numBytes);
    } catch (Exception $e) {
        $insecure = '';
        for ($i = 0; $i < $numBytes; ++$i) {
            $insecure .= chr(mt_rand(0, 255));
        }
        return $insecure;
    }
}
function yolo_random_int($min, $max)
{
    try {
        return random_int($min, $max);
    } catch (Exception $e) {
        return mt_rand($min, $max);
    }
}

But if you actually deploy this in production...

wat

cweagans commented 9 years ago

Okay, opened #62. It's not exactly a pure PHP PRNG, but I think it would be useful for people doing interesting/questionable things with this library, and it's explicitly opt-in, so doesn't affect current users of this library in any way.

cweagans commented 9 years ago

Closing since this is definitely not going to happen.

katlogic commented 9 years ago

Just for posterity, it's possible to construct simple PHP csprng provided there's fixed amount of large entropy:

hash("sha256", fread(fopen("entropy.txt","r"), 4096) . apc_inc("nonce"));

This works under 2 assumptions:

This can be robustly implemented with a database. On install, the package asks user to type in random gibberish of words to generetate entropy.bin. Nonce is naturally kept in database too.

This csprng is still inherently weak because the entropy pool never changes and attacker, in theory, can reconstruct image of it (or more likely, just get hold of it via other means). I'd trust it to generate secure session ids, but would not trust it to generate cryptographic keys. It is still significantly stronger than mt_rand/pid #yoloprng etc, which are all trivial to guess/inverse.

Userspace prngs with fixed entropy pool are only as strong as the initial entropy seed.

AshleyPinner commented 9 years ago

it's possible to construct simple PHP csprng

Followed by

but would not trust it to generate cryptographic keys

If you can't, then it's not a CSPRNG.

katlogic commented 9 years ago

@AshleyPinner

"Why not trust user CSPRNGS"

Even if there's the "secure as long state does not leak via side channel" does not mean they're cryptographically insecure.

The insecurity is simply stemming from inherent weakness of implementation - larger attack surface - it's harder for adversary to leak entropy pool bits from kernel, than from php-accessible pool file.

Some implementation detail being prone keeping secrets less secret because of unrelated vulnerabilities can't be lumped up together with cryptographic strength. When we're all implying, you're insisting heartbleed is bug in RSA :)

ircmaxell commented 9 years ago

The only source of CS quality entropy on a server is the OS itself. Period, end of story. Therefore, the only way to get a CSPRNG is to interface with the OS. If you can't do that, you can't have CS. Period.

Now, once you get CS entropy from the OS, you could use it to seed a userland CSPRNG. However, there's no real point to doing that since the CS entropy source itself is a CSPRNG. Hence any case where you can do a userland CSPRNG, you can use the kernel one better. Therefore, there's no reason to do a userland one.

A "entropy.txt" file is NOT CS. An entropy device (like /dev/urandom or a hardware entropy device) requires the kernel to give you access to it. Hence there's no benefit to not using the kernel's CSPRNG.

As far as a "fallback", then it's not CS. And it's not useful in a security context. That's the entire point of the random_* suite of functions, giving an implementation of a CS API in core that can't be disabled (though may fail in some situations).

And under no circumstances should you use a non-CS random number generator for anything security related (including session generation, etc).

katlogic commented 9 years ago

@ircmaxell Insisting that user supplied seed entropy is not CS strikes me as overzealous. By your logic, all bitcoin brainwallets are inherently insecure while in reality only users supplying low entropy will get low entropy result. There are shades of grey, not black and white.

Regarding rest of your argument, yes, relying on user is last resort scenario, but still a notch better than getpid() / mt_rand().

ircmaxell commented 9 years ago

@katlogic considering those very same bitcoin wallets have been attacked time and again for weak randomness (some epicly weak), yes, I do consider them inherently insecure.

Rely on the OS for entropy. If it's compromised, you literally cannot get good entropy. If it's not compromised, you cannot do better than it.

So why bother building something that by definition cannot be better than what exists? It is at best as secure as the OS. At worst, a lot less secure. And given the more implementations that exist, the higher chances for critical bugs, why bother in the first place?

In other words, what's wrong with the OS's entropy sources that you're trying to replace with a userland one?

katlogic commented 9 years ago

In other words, what's wrong with the OS's entropy sources that you're trying to replace with a userland one?

The whole ticket is about what to do when all other venues to access OS entropy fails.

I do consider them inherently insecure.

My suggestion is just to avoid resorting to mt_rand() and prompt the user with 'please provide secret csprng seed'. Like all userspace csprngs do. Remember typing randomly back in the day with PGP when OS lacked kernel csprngs? Granted, timed mouse movement and keypresses are a bit better than passphrase, but inherently just same thing - an user supplied input, only as good as user makes it.

If there is urandom, use urandom. It's what this package does already.

ircmaxell commented 9 years ago

The whole ticket is about what to do when all other venues to access OS entropy fails.

Yes, and the point I am making is if all avenues fail, it's impossible to create a CSPRNG. Therefore, it's better to fail than lie (since you can't be CS without the OS providing CS).

My suggestion is just to avoid resorting to mt_rand() and prompt the user with 'please provide secret csprng seed'. If there is urandom, use urandom. It's what this package does already.

The better suggestion is to tell the developers to fix their system. If this package doesn't work, their system is broken. Not "just another way of doing it". Broken. It's impossible to secure a broken system. And we shouldn't pretend it is.

katlogic commented 9 years ago

Yes, and the point I am making is if all avenues fail, it's impossible to create a CSPRNG.

I suppose ssh-keygen and gpg are now broken too, since they wait for prng pool to gather entropy from user. Kernel is not some magic quantum noise blackbox generating true randomness, the entropy has to come from somewhere. It will not give you -EIO when accessing /dev/urandom, or panic with "your PC is broken, thermal diode rng plox".

Without hardware RNG present, it will rely on sorts of dodgy things, yep, you guessed it, kernel events triggered by the user.

paragonie-scott commented 9 years ago

@katlogic:

I suppose ssh-keygen and gpg are now broken too, since they wait for prng pool to gather entropy from user.

Yes, and most people agree that this is pointless.

Kernel is not some magic quantum noise blackbox generating true randomness, the entropy has to come from somewhere

We aren't interested in true randomness, we're interested in cryptographically secure pseudorandomness, such as that provided by a stream cipher that rekeys periodically. Unless you're implementing a one-time pad, this is good enough for any practical cryptosystem you're likely to run across.

On Linux, /dev/urandom uses the SHA1 hash function to mix multiple noise inputs (e.g. CPU clock skew, noise from various device drivers, keyboard and mouse input timings from end users). On Mac and FreeBSD, they use the RC4 stream cipher (see also: Fortuna). On OpenBSD, they use ChaCha20. But the principle is the same.

The OS's urandom device can take 256 bits of entropy and stretch it into an infinite stream of unguessable bytes.

Without hardware RNG present, it will rely on sorts of dodgy things, yep, you guessed it, kernel events triggered by the user.

The kernel has access to raw device entropy; we do not. The closest we can get from C-land (which is shitty and slow) is to use timing information from various memory management functions, which is affected by every process running on the system. This is what LibreSSL's fallback code does, and I hate it.

In PHP land, we don't even have that. So don't even try.

If you cannot generate a CSPRNG, don't silently become a PRNG. If you do so, you are betraying your users.

I will never implement an insecure fallback. I will never intentionally publish code designed to fail open without the informed consent of the user. In this case, informed consent means you write your own try/catch blocks. The freedom to hurt yourself is still there, but you have to go slightly out of your way to access it.

To do otherwise is unethical for a security engineer.

You're welcome to your opinions, but this decision is final. The code is MIT licensed if you want to fork it and do things your way.

ircmaxell commented 9 years ago

I suppose ssh-keygen and gpg are now broken too, since they wait for prng pool to gather entropy from user.

So neither ssh-keygen nor gpg actually do what you're claiming. Instead, both pull from /dev/random which is the OS source of entropy. It waits for the kernel to gather enough entropy to produce the key. It does not use a user-land CSPRNG for that.

There's consensus in the security community that there's no reason for this anyway (blocking and entropy counting): http://www.2uo.de/myths-about-urandom/

Without hardware RNG present, it will rely on sorts of dodgy things, yep, you guessed it, kernel events triggered by the user.

That's a bit over-stating it. It has access to a LOT of raw information, part of which is user information. The vast majority of which isn't accessible by the user-space. Things like packet timing on network devices, the precise timing of operations from the HDD, precise timing and order of interrupts. As well as raw noise from microphone inputs and other potential sources (depending on the hardware and device drivers installed).

If the OS is not compromised, the Kernel has better access to entropy than userland, so why not use it? If it is compromised, it can control those "user events" that the child program sees, and hence the entropy that userland sees. Either way, we're relying on the kernel for security.

Adding a user-land CSPRNG provides absolutely no security benefit in any context (compromised kernel, uncompromised kernel, etc). And it introduces yet another point of attack and vector for potential bugs. This has been seen time and time again, with projects you have cited here for doing it (namely bitcoin).

So again, I ask, with an uncompromised kernel, why introduce another point of failure? Why not simply rely on the kernel's entropy source, since you by definition can do no better than it. With a compromised kernel, you can still do no better than it. So why not just trust it (since you really don't have a choice).