python / cpython

The Python programming language
https://www.python.org
Other
63.85k stars 30.56k forks source link

Add random.cryptorandom() and random.pseudorandom, deprecate os.urandom() #71466

Closed malemburg closed 8 years ago

malemburg commented 8 years ago
BPO 27279
Nosy @malemburg, @ncoghlan, @larryhastings

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields: ```python assignee = None closed_at = created_at = labels = ['type-feature', 'library'] title = 'Add random.cryptorandom() and random.pseudorandom, deprecate os.urandom()' updated_at = user = 'https://github.com/malemburg' ``` bugs.python.org fields: ```python activity = actor = 'ncoghlan' assignee = 'none' closed = True closed_date = closer = 'ncoghlan' components = ['Library (Lib)'] creation = creator = 'lemburg' dependencies = [] files = [] hgrepos = [] issue_num = 27279 keywords = [] message_count = 12.0 messages = ['267970', '267972', '267977', '267978', '267983', '267991', '267995', '268005', '268010', '268014', '268038', '274711'] nosy_count = 3.0 nosy_names = ['lemburg', 'ncoghlan', 'larry'] pr_nums = [] priority = 'normal' resolution = 'rejected' stage = 'resolved' status = 'closed' superseder = None type = 'enhancement' url = 'https://bugs.python.org/issue27279' versions = ['Python 3.6'] ```

malemburg commented 8 years ago

I propose to deprecate os.urandom() altogether due to all the issues we've discussed on all those recent tickets, see e.g. bpo-26839, bpo-27250, bpo-25420.

Unlike what we've told people for many years, it's clear that in the age of VMs/containers getting booted/started every few seconds, it's not longer the generic correct answer to people asking for random data, since it doesn't make a difference between crypto random and pseudo random data.

By far most use cases only need pseudo random data and only very few applications require crypto random data.

Instead, let's define something everybody can start to use correctly and get sane behavior on most if not all platforms. As Larry suggested in bpo-27266, getrandom() is a good starting point for this, since it's adoption is spreading fast and it provides the necessary features we need for the two new APIs.

I propose these new APIs:

Crypto applications will then clearly know that random.cryptorandom() is the right choice for them and everyone else can use random.pseudorandom().

random.cryptorandom() will guarantee that the returned data is safe for crypto applications on all platforms, blocking or raising an exception if necessary to make sure only safe data is returned. The API should get a parameter to determine whether to raise or block.

random.pseudorandom() will guarantee to not block and always return random data that can be used as seeds for simulations, games, tests, monte carlo, etc.

The APIs should use the getrandom() C API, where available, with appropriate default settings, i.e. blocking or raising for random.cryptorandom() and non-blocking, non-raising for random.pseudorandom().

The existing os.urandom() would then be deprecated to guide new developments to the these new APIs, getting rid of the ambiguities and problems this interface has on several platforms (see all the other tickets and https://en.wikipedia.org/wiki//dev/random for details).

malemburg commented 8 years ago

Fleshing out the API signatures and implementation details will have to be done in a PEP.

The topic is (as all the related ticket show) too complex for discussions on a bug tracker.

I just opened this ticket for reference to the idea.

tiran commented 8 years ago

-1

os.urandom() is just fine. Let's not confuse users and make it even harder to write secure software.

larryhastings commented 8 years ago

I +1 on new functions that are designated the best-practice places to get your pseudo-random numbers.

(IDK if the best place for both is in random; maybe the crypto one should be in secrets?)

To be precise: on most OSes what you're calling "crypto random data" is actually "crypto-quality pseudo-random data". Very few platforms provide genuine random data--rather, they seed a CPRNG and give you data from that. (And then the cryptographers have endless arguments about whether the CPRNG is really crypto-safe.)

I'm -1 on actually deprecating os.urandom(). It should be left alone, as a thin wrapper around /dev/urandom. I imagine your cryptorandom() and pseudorandom() functions would usually be written in Python and just import and use the appropriate function on a platform-by-platform basis.

malemburg commented 8 years ago

Some resources:

malemburg commented 8 years ago

On 09.06.2016 10:07, Larry Hastings wrote:

I +1 on new functions that are designated the best-practice places to get your pseudo-random numbers.

(IDK if the best place for both is in random; maybe the crypto one should be in secrets?)

All up for discussion. As long as we get the separation clear, I'm fine with any location in the stdlib.

To be precise: on most OSes what you're calling "crypto random data" is actually "crypto-quality pseudo-random data". Very few platforms provide genuine random data--rather, they seed a CPRNG and give you data from that. (And then the cryptographers have endless arguments about whether the CPRNG is really crypto-safe.)

Yes, I know, this should be documented in the docs for random.cryptorandom().

We might even make the available entropy available as additional API, on platforms where this is possible, or even add APIs to access the entropy daemon where available:

http://egd.sourceforge.net/

(the necessary API is available via OpenSSL: http://linux.die.net/man/3/rand_egd)

Some crypto applications do need to know a bit more about where the random data is coming from, e.g. for generation of root certificates and secure one time pads.

I'm -1 on actually deprecating os.urandom(). It should be left alone, as a thin wrapper around /dev/urandom. I imagine your cryptorandom() and pseudorandom() functions would usually be written in Python and just import and use the appropriate function on a platform-by-platform basis.

Fair enough. I don't feel strong about this part. The main idea here was to move people away from thinking that we can fix a broken system, which is not under our control (because it's a shim on an OS device).

How we implement the functions is up to debate as well. I could imaging that we expose the getrandom() function as e.g. random._getrandom() and then use this from Python where available, with fallbacks to other solutions where necessary. This would also make it possible to have similar functionality on non-CPython platforms and opens up the door for future changes without breaking applications again.

tiran commented 8 years ago

On 2016-06-09 10:30, Marc-Andre Lemburg wrote:

Marc-Andre Lemburg added the comment:

On 09.06.2016 10:07, Larry Hastings wrote: > > I +1 on new functions that are designated the best-practice places to get your pseudo-random numbers. > > (IDK if the best place for both is in random; maybe the crypto one should be in secrets?)

All up for discussion. As long as we get the separation clear, I'm fine with any location in the stdlib.

> To be precise: on most OSes what you're calling "crypto random data" is actually "crypto-quality pseudo-random data". Very few platforms provide genuine random data--rather, they seed a CPRNG and give you data from that. (And then the cryptographers have endless arguments about whether the CPRNG is really crypto-safe.)

Yes, I know, this should be documented in the docs for random.cryptorandom().

We might even make the available entropy available as additional API, on platforms where this is possible, or even add APIs to access the entropy daemon where available:

EDG has died about 15 years ago. Please don't reanimate it.

Some crypto applications do need to know a bit more about where the random data is coming from, e.g. for generation of root certificates and secure one time pads.

No, that is not how applications deal with certificates or OTPs. When an application is really, REALLY concerned with RNG source on that level, it will never ever use Python or even a Kernel CSPRNG to generate private keys. Instead it will use a certified, industrial grade HSM (hardware security model) to offload all cryptographic operations on a secure device.

malemburg commented 8 years ago

Some more resources for FreeBSD:

larryhastings commented 8 years ago
  • FreeBSD will likely switch to the new Fortuna successor of Yarrow in an upcoming release:

A little more information about that.

FreeBSD did a major refactoring of their /dev/urandom (etc) support, which landed in October 2014:

https://svnweb.freebsd.org/base?view=revision&revision=273872

This kept Yarrow but also added Fortuna. You can switch between them with a kernel option.

FreeBSD 10 shipped in January 2014, so clearly this rework didn't make it in.

I see several references to "let's make Fortuna the default in FreeBSD 11". FreeBSD 11 hasn't shipped yet. However, the "what's new in FreeBSD 11" wiki page doesn't mention changing this default. So I don't know whether or not it's happening for 11.

malemburg commented 8 years ago

Resources for entropy gathering sources:

ncoghlan commented 8 years ago

As with other proposals to add new APIs, I think this is an overreaction to a Linux specific problem. Linux system boot could deadlock with 3.5.0 and 3.5.1 due to:

As long as we switch the internal hash algorithm to seeding from a non-blocking random source, and also ensure that importing the random module doesn't implicitly call os.urandom, then any other software that only needs pseudorandom data can just use the random module APIs.

ncoghlan commented 8 years ago

PEP-524 has been implemented for 3.6b1 in bpo-27776, so os.urandom() itself will now do the right thing for cryptographic use cases on Linux.