tc39 / proposal-seeded-random

Proposal for an options argument to be added to JS's Math.random() function, and some options to start it with.
MIT License
156 stars 6 forks source link

Default pseudo-random algorithm specification will make securing the standard in future needlessly complicated. #14

Closed OJezu closed 4 years ago

OJezu commented 4 years ago

This is similar to #8, but my argument is that hard specifying the default algorithm won't do.

A lot of languages and libraries had to change their default pseudo-random generators in history, due to weakness or misuse of the algorithm by the user, change in computational power, or new mathematical weakness of the algorithm being discovered. (Chrome 47, PHP's misused mt_rand, 3DES falling out of use, etc.)

If the algorithm needs to be changed, what happens?

The current API is:

const seed = 3;
const prng = Math.seededPRNG({seed})
console.log(prng.random()); // let's say 0.7

If the PRNG algorithm is changed, the newer version of runtime will generate a different number, unacceptable BC break. Also, what if new PRNG would rather use a pair of numbers, or otherwise a non-scalar seed?

If API is altered, so that the new calls use an additional parameter, while the old ones remain the same:

const seed = 3;
const prng = Math.seededPRNG({seed})
console.log(prng.random()); // still 0.7

const newPrng = Math.seededPRNG({seed, algo: 'newSecurePRNG'});
console.log(newPrng.random()); // this time it's 0.8

const newSeed = {a: 10, b: 20};
const newPrng2 = Math.seededPRNG({newSeed, algo: 'ellipticCurves'});
console.log(newPrng.random()); // or 0.552323 maybe

That's a bit better, but unless developers remember the new algo parameter they would be using the old, now insecure PRNG. How to prevent that?

The horrifying alternative:

const seed = 3;
// obsolete, the IDE will show a warning probably
const prng = Math.seededPRNG({seed})
console.log(prng.random()); // still 0.7

const prng = Math.newBetterSeededPRNG({seed})
console.log(prng.random()); // now 0.3

That's terrifying, two APIs, one only for BC, and a new one to use in new applications. And hopefully people use IDEs which will warn them.

What if algorithm parameter was required from begining?

// predictable PRNG, possibily built in
const seed = 3;
const prng = Math.seededPRNG({seed, algorithm: 'MT19937'})
console.log(prng.random()); // let's say 0.7

// something more secure, maybe added in a later version
const seed = {iv: new Uint8Array([]), key: new new Uint8Array([])};
const prng = Math.seededPRNG({seed, algorithm: 'aes256'})
console.log(prng.random()); // let's say 0.7

// maybe even custom?
const seed = 10;
const prng = Math.seededPRNG({seed, algorithm: (seed) => [value, nextSeed]})
console.log(prng.random());

What have the other languages done? PHP, the greatest example of design by lack of design, started by providing most of the libc functions, then recommended not to use rand, as it was using the libc algorithm of unknown quality, and recommended mt_rand, which was even worse. They they said, use openssl_random_pseudo_bytes which has an optional(!) flag to make it secure, then they said, hey, just use the new random_bytes function. The last two are not seedable, so they left people who needed unpredictable but seedable PRNGs in dark. Don't be like PHP.

Python has a warning in random module documentation not to use it for security, secrets module is not seedable.

Should the seedable PRNG be even cryptographically secure?

That's the question to be asked, too. Usecases for seedable CSPRNG are limited, and are depending on the user to know how to properly seed the PRNG for the purpose in the first place. On the other hand some developers might just assume that the seededPRNG is secure, anyway.

peteroupc commented 4 years ago

Here are my thoughts:

The names seededPRNG and newBetterSeededPRNG are too generic; rather it ought to be the name of a specific PRNG algorithm, which better tells users what kind of PRNG they're getting and better ensures the PRNG will remain consistent across time and across versions of ECMAScript.

PHP's rand and mt_rand (as well as Python's random.* functions and Numpy's random.random.* functions) are problematic not only because of the quality of the PRNGs they implement, but also because these two functions act on global state. The problems are explained in further detail in NumPy's new RNG policy. As the policy shows, NumPy is moving away from PRNGs that use program-wide global state.

I see no need for a cryptographic RNG that users can seed for reproducible "randomness". At most, the cryptographic RNG should let users add extra data to increase the RNG's randomness, but this is not without risk (see, e.g., D. J. Bernstein's article "Entropy attacks"). For information security purposes, users can make use of libraries that implement hash functions, pseudorandom functions, KDFs, etc., that take in secrets and output pseudorandom numbers. In any case, cryptographic RNGs are not exactly trivial to implement.

The use cases for a seedable RNG, rather, include simulations, machine learning, Monte Carlo sampling, games, and the like. For these use cases, noncryptographic RNGs are enough, especially since these cases don't involve information security.

See also my guidelines for new RNG APIs.

OJezu commented 4 years ago

@peteroupc I disagree with names being too generic. Names of specific algorithms will be hardly discoverable, and will lead to API bloat.

The proposal does not use global state, so that part not problematic.

My biggest worry is, that this proposal if accepted will become a relict in 5-20 years, frozen by BC.

I see no need for a cryptographic RNG that users can seed for reproducible "randomness"

Those happen, but usually in kind of programs not usually written in ECMAScript, or otherwise the answer might be "use a cipher". Someone will at some point need that and sure, they can use a library, but they might use a library if they want any seedable PRNG for non-cryptographic use too.

https://www.npmjs.com/package/mersenne-twister https://github.com/dworthen/prng https://www.npmjs.com/package/random-js and others

ljharb commented 4 years ago

I don’t see why changing the algorithm would break back compact - the api contract for Math.random isn’t “uses a specific algorithm”, it’s “provides a number within a range”, and the algorithm can safely change at any time, breaking nobody who’s relying on the contract. The same seems like it would apply here.

tabatkins commented 4 years ago

A lot of languages and libraries had to change their default pseudo-random generators in history, due to weakness or misuse of the algorithm by the user, change in computational power, or new mathematical weakness of the algorithm being discovered.

This is very explicitly not a secure random generator. Weaknesses in the algo don't matter because if you're using this for secure purposes you've already lost (since the seed will be present in your script, and thus reproducible...).

The only reason we might want to change the algorithm in the future is not for security, but just to provide a randomness with better qualities (less bias, etc). If that day comes, and it's deemed worthwhile, I'm fine with adding an algorithm parameter. Assuming we make a reasonable choice of algorithm now, tho (which isn't hard), I don't anticipate any problem with people continuing to get the default "old" random algo when they don't specify one in the future.

I don’t see why changing the algorithm would break back compact

It absolutely would. This isn't Math.random(); some of the explicit use-cases provided for it depend on the algo being the same across time and UAs, so you can faithfully reproduce random sequences for replays.

ljharb commented 4 years ago

Thanks for clarifying.

OJezu commented 4 years ago

@tabatkins

This is very explicitly not a secure random generator.

This is the proposal intention, not how the users will use this API. React has named dangerouslySetInnerHTML prop specifically, so that even most inexperienced developer will know it is... well, dangerous. Is this enough case for concern to change the API? Maybe not, but if the point of the proposal is to provide PRNG with standard-level guarantees, but don't bother how robust it is or if it will be misused, I argue the developers can use any already available prng library for the same effect.

since the seed will be present in your script, and thus reproducible...

ECMAscript is used server-side too, mostly prominently in node.js. The seed might also be obtained from a secure API, does not have to be hardcoded.

I agree with the rest of the response, but I thought this is important point to be brought up.

tabatkins commented 4 years ago

Right, I can't predict how people will use it. The name has already been bikeshedded into seededPRNG specifically to ensure people don't mistake it for an RNG. We could go further, but I don't think making the name explicitly dangerous-looking and annoying to type is useful - we do want people to use this API, unlike innerHTML in React.

I argue the developers can use any already available prng library for the same effect.

For all the use-cases listed so far for this proposal you can write an LCG in one or two lines that does just fine. This proposal is absolutely syntax sugar. Its value is that knowing you can write an LCG is a relatively rare piece of knowledge, and tracking down useful constants requires some advanced knowledge as well; the fact that approximately no one writes one for the Houdini Paint API demos that involve randomness shows that this is something too esoteric for most authors to know about, but the fact that all the Math.random()-using Paint API demos jiggle randomly in terrible ways shows that it's important to let authors know about it.

ECMAscript is used server-side too, mostly prominently in node.js. The seed might also be obtained from a secure API, does not have to be hardcoded.

Sure. This still isn't a Crypto API ^_^

Ultimately, let's look at the ergonomics. Let's assume we require people to specify their algorithm explicitly (even tho there's only one right now), and later discover this algorithm isn't good and we want to provide another one that's more cryptographically secure.

There's several communities we need to think about here. All the legacy code, explicitly specifying the old algorithm? Still vulnerable; if they were using this for secure purposes, they're just as broken as if they'd gotten the algorithm as the "default". New code? We still have to support the old value, so they can still write the old algorithm into their code and be just as broken; at best we can log a warning in the console, but we can do that with an unspecified algo, too. The only group that's not vulnerable are those that learn how to use the API from sources that teach them to use the new value, but that's just as true whether the old algo was defaulted or explicitly required.

The only benefits of explicit passing are that it might have people think about the algorithm and go look up what the options are. As an experienced standards author, tho, I know that it's just as likely (if not more) that instead it'll just be a copy/paste from elsewhere in the codebase to make the function work. And this doesn't help if people learn how to invoke it before we add the new value, as they then have no reason to go check if there are more values to use.

The downside of explicit passing is that it's a good chunk of additional characters to write whenever you invoke this, and until we introduce a second parameter, just a copy/pasted bit of nonsense that doesn't control anything.

So in my opinion, the downsides outweigh the upsides by a reasonable amount. We need to make it clear in both the spec and in teaching material that this is not a crypto API and is not useful for security purposes, but with luck front-end articles will already be encouraged to say that because of the usually-visible seed.