Python 3.5 running on Linux kernel 3.17+ can block at startup or on importing the random module on getrandom()

doko42 commented 8 years ago

BPO	26839
Nosy	@malemburg, @rhettinger, @doko42, @ncoghlan, @vstinner, @larryhastings, @matejcik, @ned-deily, @alex, @skrah, @vadmium, @ztane, @dstufft, @Lukasa, @tpetazzoni
Files	nonblocking-getrandom.diff: Patch to py_getrandom to use nonblocking system call, and associated plumbing. getrandom-nonblocking-v2.patch: Patch random.c to use nonblocking getrandom() getrandom-nonblocking-v3.patch: Patch random.c to use nonblocking getrandom() (cleaned-up version). getrandom_nonblocking_v4.patch nonblocking_urandom_noraise.patch no-urandom-by-default.diff

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields: ```python assignee = None closed_at = created_at = labels = ['interpreter-core', 'type-bug', 'release-blocker'] title = 'Python 3.5 running on Linux kernel 3.17+ can block at startup or on importing the random module on getrandom()' updated_at = user = 'https://github.com/doko42' ``` bugs.python.org fields: ```python activity = actor = 'vstinner' assignee = 'none' closed = True closed_date = closer = 'larry' components = ['Interpreter Core'] creation = creator = 'doko' dependencies = [] files = ['42837', '42842', '43265', '43267', '43278', '43282'] hgrepos = [] issue_num = 26839 keywords = ['patch'] message_count = 172.0 messages = ['264121', '264122', '264126', '264258', '264265', '264267', '264270', '264271', '264284', '264289', '264292', '264303', '265427', '265430', '265452', '265477', '265481', '265485', '265496', '265500', '265549', '265555', '266216', '267455', '267504', '267511', '267537', '267539', '267546', '267550', '267554', '267571', '267608', '267609', '267610', '267611', '267612', '267614', '267616', '267617', '267621', '267623', '267624', '267625', '267626', '267627', '267628', '267629', '267630', '267631', '267632', '267633', '267634', '267635', '267636', '267637', '267638', '267640', '267642', '267643', '267644', '267645', '267648', '267650', '267654', '267656', '267660', '267661', '267663', '267664', '267665', '267666', '267667', '267668', '267669', '267670', '267671', '267672', '267673', '267674', '267675', '267676', '267677', '267678', '267679', '267680', '267681', '267682', '267684', '267685', '267686', '267687', '267688', '267689', '267690', '267693', '267694', '267695', '267696', '267699', '267705', '267707', '267709', '267710', '267711', '267712', '267715', '267716', '267718', '267720', '267721', '267723', '267725', '267726', '267728', '267729', '267730', '267731', '267735', '267737', '267739', '267740', '267741', '267745', '267746', '267749', '267750', '267751', '267752', '267803', '267804', '267805', '267806', '267807', '267808', '267809', '267810', '267811', '267812', '267813', '267815', '267816', '267817', '267818', '267819', '267823', '267825', '267831', '267836', '267837', '267846', '267850', '267853', '267855', '267856', '267857', '267863', '267873', '267887', '267890', '267893', '267897', '267898', '267913', '267914', '267939', '268018', '268201', '268591', '268593', '268627', '268629'] nosy_count = 19.0 nosy_names = ['lemburg', 'rhettinger', 'doko', 'ncoghlan', 'vstinner', 'larry', 'matejcik', 'ned.deily', 'alex', 'skrah', 'python-dev', 'martin.panter', 'ztane', 'dstufft', 'pitti', 'Lukasa', 'thomas-petazzoni', 'Colm Buckley', 'Theodore Tso'] pr_nums = [] priority = 'release blocker' resolution = 'fixed' stage = 'resolved' status = 'closed' superseder = None type = 'behavior' url = 'https://bugs.python.org/issue26839' versions = ['Python 3.5', 'Python 3.6'] ```

doko42 commented 8 years ago

[forwarded from https://bugs.debian.org/822431]

This regression / change of behaviour was seen between 20160330 and 20160417 on the 3.5 branch. The only check-in which could affect this is the fix for issue bpo-26735.

3.5.1-11 = 20160330 3.5.1-12 = 20160417

Martin writes: """ I just debugged the adt-virt-qemu failure with python 3.5.1-11 and tracked it down to python3.5 hanging for a long time when it gets called before the kernel initializes its RNG (which can take a minute in VMs which have low entropy sources).

With 3.5.1-10:

  $ strace -e getrandom python3 -c 'True'
  +++ exited with 0 +++

With -11: $ strace -e getrandom python3 -c 'True' getrandom("\300\0209\26&v\232\264\325\217\322\303:]\30\212Q\314\244\257t%\206\"", 24, 0) = 24 +++ exited with 0 +++

When you do this with -11 right after booting a VM, the getrandom() can block for a long time, until the kernel initializes its random pool:

11:21:36.118034 getrandom("/V#\200^O*HD+D_\32\345\223M\205a\336/\36x\335\246", 24, 0) = 24 11:21:57.939999 ioctl(0, TCGETS, 0x7ffde1d152a0) = -1 ENOTTY (Inappropriate ioctl for device)

[ 1.549882] [TTM] Initializing DMA pool allocator [ 39.586483] random: nonblocking pool is initialized

(Note the time stamps in the strace in the first paragraph)

This is really unfriendly -- it essentially means that you stop being able to use python3 early in the boot process or even early after booting. It would be better to initialize that random stuff lazily, until/if things actually need it.

In the diff between -10 and -11 I do seem some getrandom() fixes to supply the correct buffer size (but that should be irrelevant as in -10 getrandom() wasn't called in the first place), and a new call which should apply to Solaris only (#ifdef sun), so it's not entirely clear where that comes from or how to work around it.

It's very likely that this is the same cause as for bpo-821877, but the description of that is both completely different and also very vague, so I file this separately for now. """

doko42 commented 8 years ago

other issues fixed between these dates:

- Issue bpo-26659: Make the builtin slice type support cycle collection.
- Issue bpo-26718: super.__init__ no longer leaks memory if called multiple
  times.  NOTE: A direct call of super.__init__ is not endorsed!
- Issue bpo-25339: PYTHONIOENCODING now has priority over locale in setting
  the error handler for stdin and stdout.
- Issue bpo-26717: Stop encoding Latin-1-ized WSGI paths with UTF-8.
- Issue bpo-26735: Fix :func:`os.urandom` on Solaris 11.3 and newer when
  reading more than 1,024 bytes: call ``getrandom()`` multiple times with
  a limit of 1024 bytes per call.
- Issue bpo-16329: Add .webm to mimetypes.types_map.
- Issue bpo-13952: Add .csv to mimetypes.types_map.
- Issue bpo-26709: Fixed Y2038 problem in loading binary PLists.
- Issue bpo-23735: Handle terminal resizing with Readline 6.3+ by installing
  our own SIGWINCH handler.
- Issue bpo-26586: In http.server, respond with "413 Request header fields too
  large" if there are too many header fields to parse, rather than killing
  the connection and raising an unhandled exception.
- Issue bpo-22854: Change BufferedReader.writable() and
  BufferedWriter.readable() to always return False.
- Issue bpo-6953: Rework the Readline module documentation to group related
  functions together, and add more details such as what underlying Readline
  functions and variables are accessed.

vstinner commented 8 years ago

Python 3 uses os.urandom() at startup to randomize the hash function. os.urandom() now uses the new Linux getrandom() function which blocks until the Linux kernel is feeded with enough entropy. It's a deliberate choice.

The workaround is simple: set the PYTHONHASHSEED environment variable to use a fixed seed. For example, PYTHONHASHSEED=0 disables hash randomization.

If you use virtualization and Linux is not feeded with enough entropy, you have security issues.

I just debugged the adt-virt-qemu failure (...)

If you use qemu, you can use virt-rng to provide good entropy to the VM from the host kernel.

vstinner commented 8 years ago

See also the issue bpo-25420 which is similar but specific to "import random".

vstinner commented 8 years ago

The issue bpo-25420 has been closed as a duplicate of this issue.

Copy of the latest message:

msg264262 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2016-04-26 12:05

I still believe the underlying system API use should be fixed rather than all the different instances where it gets used.

getrandom() should not block. If it does on a platform, that's a bug on that platform and Python should revert to the alternative of using /dev/urandom directly (or whatever other source of randomness is available).

Disabling hash randomization is not a good workaround for the issue, since it will definitely pop up in other areas as well.

5531d0d8-2a9c-46ba-8b8b-ef76132a492c commented 8 years ago

Hmm. Why does os.urandom(), which should explicitly not block, use a blocking getrandom() function?

This is quite unexpected on Linux.

5531d0d8-2a9c-46ba-8b8b-ef76132a492c commented 8 years ago

Wow, it's by design:

" os.urandom(n)

Return a string of n random bytes suitable for cryptographic use."

``man urandom'':

"A read from the /dev/urandom device will not block waiting for more entropy. As a result, if there is not sufficient entropy in the entropy pool, the returned values are theoretically vulnerable to a crypto- graphic attack on the algorithms used by the driver."

vstinner commented 8 years ago

"Hmm. Why does os.urandom(), which should explicitly not block, use a blocking getrandom() function? This is quite unexpected on Linux."

I modified os.getrandom() in the issue bpo-22181 to use the new getrandom() syscall of Linux 3.17. The syscall blocks until the Linux kernel entropy pool is *initialized* with enough entropy. In a healthy system, it must never occur.

To be clear: you get read 10 MB (or 1 GB or more) of random data using os.urandom() even if the entropy pool is empty. You can test:

In a terminal 1, run "dd if=/dev/random of=random" to ensure that the entropy pool is empty
In a terminal 2, run "while true; do cat /proc/sys/kernel/random/entropy_avail; sleep 1; done" to see that entropy pool is empty (or very low, like less 100 bytes)
In a terminal 3, get a lot of random data using os.urandom(): ./python -c 'import os; x=os.urandom(1024*1024*10)'

=> it works, you *can* get 10 MB of random data even if the kernel entropy pool is empty.

Reminder: getrandom() is used to avoid a file descriptor which caused various issues (see issue bpo-22181 for more information).

Ok, now this issue. The lack of entropy is a known problem in virtual machine. It's common that SSH, HTTPS, or other operations block because because of the lack of entropy. On bare metal, the Linux entropy pool is feeded by physical events like interruptions, keyboard strokes, mouse moves, etc. On a virtual machine, there is *no* source of entropy.

The problem is not only known but also solved, at least for qemu: you must attach a virtio-rng device to your virtual machine. See for example https://fedoraproject.org/wiki/Features/Virtio_RNG The VM can now reads fresh and good quality entropy from the host.

To come back to Python: getrandom() syscall only blocks until the entropy pool is *initialized* with enough entropy.

The getrandom() syscall has a GRND_NONBLOCK to fail with EAGAIN if reading from /dev/random (not /dev/urandom) would block because the entropy pool has not enough entropy: http://man7.org/linux/man-pages/man2/getrandom.2.html

IMHO it's a deliberate choice to block in getrandom() when reading /dev/urandom while the entropy pool is not initialized with enough entropy yet.

Ok, now the question is: should python do nothing to support VM badly configured (with no real source of entropy)?

It looks like the obvious change is to not use getrandom() but revert code to use a file descriptor and read from /dev/urandom. We will get bad entropy, but Python will be able to start.

I am not excited by this idea. The os.urandom() private file descriptor caused other kinds of issues and bad quality entropy is also an issue.

5531d0d8-2a9c-46ba-8b8b-ef76132a492c commented 8 years ago

It is clear how /dev/urandom works. I just think that securing enough entropy on startup should be done by the init scripts (if systemd still allows that :) and not by an application.

[Unless the application is gpg or similar.]

vstinner commented 8 years ago

Since many years, Linux systems store entropy on disk to quickly feed the entropy pool at startup.

It doesn't create magically entropy on VM where you start with zero entropy at the first boot.

5531d0d8-2a9c-46ba-8b8b-ef76132a492c commented 8 years ago

I did not claim that it magically creates entropy. -- Many VMs are throwaway test beds. It would be annoying to setup some entropy gathering mechanism just so that Python starts.

malemburg commented 8 years ago

As mentioned on the other issue bpo-25420, this is a regression and a change in documented behavior of os.urandom(), which is expected to be non-blocking, regardless of whether entropy is available or not.

The fix should be easy (from reading the man page http://man7.org/linux/man-pages/man2/getrandom.2.html): set the GRND_NONBLOCK flag on getrandom(); then, if the function returns -1 and sets EAGAIN, fallback to reading from /dev/urandom directly.

10da46c7-ab01-4256-82c4-e1aba4ee3583 commented 8 years ago

It's worth noting that this issue now affects every installation of Debian testing track with systemd and systemd-cron installed; the python program /lib/systemd/system-generators/systemd-crontab-generator is called very early in the boot process; it imports hashlib (although only .md5() is used) and blocks on getrandom(), delaying boot time until a 90s timeout has occurred.

Suggestions: modify hashlib to avoid calling getrandom() until entropy is actually required, rather than on import; change the logic to use /dev/urandom (or an in-process PRNG) when getrandom() blocks; or both.

10da46c7-ab01-4256-82c4-e1aba4ee3583 commented 8 years ago

Oh; it's not actually hashlib which is calling getrandom(), it's the main runtime - the initialization of the per-process secret hash seed in _PyRandom_Init

Don't know enough about the internal logic here to comment on what the Right Thing is; but I second the suggestion of msg264303. This might just require setting "flags" to GRND_NONBLOCK in py_getrandom() assuming that's portable to other OS.

10da46c7-ab01-4256-82c4-e1aba4ee3583 commented 8 years ago

The attached patch (against 20160330) addresses the issue for me on Linux; it has not been tested on other platforms. It adds the GRND_NONBLOCK flag to the getrandom() call and sends the appropriate failure return if it returns due to lack of entropy. The enclosing functions fall back to reading from /dev/urandom in this case.

Affected files:

Python/random.c - changes to py_getrandom() configure.ac and pyconfig.h.in - look for linux/random.h for inclusion

Can this, or something similar, be considered for integration with mainline?

10da46c7-ab01-4256-82c4-e1aba4ee3583 commented 8 years ago

A couple of things to note:

Despite the earlier title; this does not just apply to VMs; any system with a potentially-blocking getrandom() (including all Linux 3.17+ and Solaris 11+) is affected.
It's true that getrandom() only blocks on Linux when called before the RNG entropy pool is initialized. However, Python should not be limited to only being called after this initialization.
In particular, systemd-cron relies on a Python script being called very early in the boot process (before the urandom pool is initialized), this is now prevalent on the Debian testing track; causing a 90-second boot delay.
The patch I supplied causes getrandom() to be only called in nonblocking mode; this seems consistent with the desired semantics of os.urandom and _PyRandomInit.

Hope this helps.

Colm

10da46c7-ab01-4256-82c4-e1aba4ee3583 commented 8 years ago

@haypo - yes, it's strange that Linux's getrandom() might block even when reading the urandom pool. However, I think we need to just cope with this and add the GRND_NONBLOCK flag rather than attempting to force a change in the Linux kernel

10da46c7-ab01-4256-82c4-e1aba4ee3583 commented 8 years ago

See https://lwn.net/Articles/606141/ for an explanation of the blocking behavior of getrandom(). This makes sense to me - before the pool has initialized, /dev/urandom will be readable but will return highly predictable data - ie: it should not be considered safe. In other words, I think that getrandom() offers a sensible API.

The only circumstances where we hit the EAGAIN in getrandom() should be when it's called extremely early in the boot process (as is the case for the systemd-cron generator script I mentioned earlier). I think this is safe enough; a more thorough approach would be to flag that the per-process hash seed (_Py_HashSecret) is predictable and shouldn't be used.

vstinner commented 8 years ago

Please elaborate the comment in the patch:

explain that the RNG is not initialized yet with enough entropy
add a referénce to this issue
explain that it's a deliberate choice to use weak (non initialized) RNG for practical reasons

10da46c7-ab01-4256-82c4-e1aba4ee3583 commented 8 years ago

@haypo - new version of patch attached with comments and references per your request.

vstinner commented 8 years ago

getrandom-nonblocking-v2.patch:

+ / Alternative might be to return all-zeroes as a strong + signal that these are not random data. */

I don't understand why you propose that in a comment of your change. I don't recall that this idea was proposed or discussed here.

IMHO it's a very bad idea to fill the buffer with zeros, the caller simply has no idea how to check the quality of the entropy. A buffer filled with zeros is "possible" even with high quality RNG, but it's really very very rare :-)

If you consider that a strong signal is required, you must raise an exception. But it looks like users don't care of the quality of the RNG, they request that Python "just works".

10da46c7-ab01-4256-82c4-e1aba4ee3583 commented 8 years ago

@haypo - yes, I think you're right. Can you delete those two lines (or I can upload another version if you prefer).

I think the pragmatic thing here is to proceed by reading /dev/urandom (as we've discussed). It's not safe to raise an exception in py_getrandom from what I can see; a thorough effort to signal the lack of randomness to outer functions needs more code examination than I have time to carry out at the moment.

From looking at when PyRandom_Init is called and how the hash secret is used; I think it is safe to proceed with /dev/urandom. The general understanding is that urandom has a lower entropy quotient than random, so it's hopefully not going to be used in strong crypto contexts.

10da46c7-ab01-4256-82c4-e1aba4ee3583 commented 8 years ago

@haypo - just wondering where things stand with this? Is this patch going to get pushed to the mainline?

ned-deily commented 8 years ago

Since 3.5.2 is almost upon us, I'm setting this to "release blocker" status so we can make a decision about whether this should be changed for 3.5.2 or not. @haypo, do you have an opinion about the patch?

vadmium commented 8 years ago

Minor thing: the patch has tabbed intentation in places rather than spaces.

As I understand it, if there is no entropy initialized, this patch will fall back to reading /dev/urandom, which will return predictable data (opposite of “random” data!). But since we take this non-strict fallback in other cases (e.g. no OS support), there is a decent argument for also taking the predictable fallback path when entropy is uninitialized.

vadmium commented 8 years ago

Maybe an alternative would be to add a special PYTHONHASHSEED=best-effort (or whatever) value that says if there is no entropy available, use a predictable hash seed. That would force whoever starts the Python process to be aware of the problem.

ff59cd45-ebe3-4b3e-9696-65dc59a38b8c commented 8 years ago

I don't think setting environment variables is a solution, as it is not always clear which script occurs early in the boot process, or even that which program has components written in Python. However I'd want to be notified of failure as well, perhaps a warning should be emitted.

5531d0d8-2a9c-46ba-8b8b-ef76132a492c commented 8 years ago

I think such warnings should be emitted at application level, similar to the case when a program refuses to run under UID 0.

If admins wish, they can also integrate such checks into the system startup sequence (e.g. runlevel 3 is only reached if randomness is actually available).

larryhastings commented 8 years ago

Speaking as the 3.5 RM, I suppose I have to have an opinion. I don't think "Python now uses a better source of randomness to seed the random module at startup" is a major feature. It's a nice-to-have, not a must-have. And people who care about good randomness (e.g. people doing crypto with the random module) shouldn't be relying on the freebie initialization they get just by importing.

So I think changing the default is fine, especially if the new default is "seed from the entropy pool, but if it's empty failover to the not-as-good source of random bits". If you think that's a bad move, please add your comments here--I'm willing to have my mind changed about this.

I'll remind you: the schedule says I tag 3.5.2 RC 1 this coming Saturday (almost exactly six days from now). Naturally I'd prefer to make the release on time.

10da46c7-ab01-4256-82c4-e1aba4ee3583 commented 8 years ago

@larry -

Thank you for joining in. I'm uploading a third version of the patch (against clean 3.5.1 source, with correct whitespace and a less confusing comment) which implements the following:

configure.ac / pyconfig.h.in : looks for linux/random.h and sets HAVE_LINUX_RANDOM_H if present.
random.c : calls getrandom() with the GRND_NONBLOCK flag; if that fails, fall back to reading /dev/urandom which will have insufficient entropy but will at least return some data.

I feel that there is no consistent way to signal to higher-level applications that the random data has sub-standard entropy; but that this at least preserves the expected semantics, and doesn't block on startup in the event of an uninitialized entropy pool.

larryhastings commented 8 years ago

I'm uploading a third version of the patch (against clean 3.5.1 source

Not against the 3.5 branch from hg.python.org/cpython ? If not, why not?

10da46c7-ab01-4256-82c4-e1aba4ee3583 commented 8 years ago

@larry

Short version; I'm not set up on HG and don't have enough time to get there from here. The patch I submitted applies cleanly to the HG tip as of 15 minutes ago (rev 101768) with only offset changes; the attached v4 version includes the necessary offset changes.

1762cc99-3127-4a62-9baf-30c3d0f51ef7 commented 8 years ago

New changeset 9de508dc4837 by Victor Stinner in branch '3.5': os.urandom() doesn't block on Linux anymore https://hg.python.org/cpython/rev/9de508dc4837

vstinner commented 8 years ago

Sorry for the delay.

getrandom_nonblocking_v4.patch LGTM, but I made a major change: if getrandom() fails with EAGAIN, it now *always* fall back on reading /dev/urandom.

I also documented the change in os.urandom() documentation and in Misc/NEWS.

I pushed the fix to Python 3.5 and 3.6. Python 2.7 doesn't use getrandom() and so doesn't need the fix. I now consider that the bug is fixed.

If you consider that it's important enough to retry calling getrandom() each time os.urandom() is called, please open a new issue with a patch and elaborate your rationale :-)

vstinner commented 8 years ago

Manual check to ensure that getrandom() syscall is used on Linux:

$ strace -o trace ./python -c 'import os; os.urandom(16); os.urandom(16)' && grep getrandom trace 
getrandom("...", 24, GRND_NONBLOCK) = 24
getrandom("...", 16, GRND_NONBLOCK) = 16
getrandom("...", 16, GRND_NONBLOCK) = 16

The first read of 24 bytes is to initialize the randomized hash function.

vstinner commented 8 years ago

I'm the author of the os.urandom() change which introduced the usage of the new getrandom() syscall: see the issue bpo-22181. My motivation was to avoid the internal file descriptor to read /dev/urandom. In some corner cases (issue bpo-18756), creating a file descriptor fails with EMFILE. Python introduced a workaround keeping the file descriptor open (issue bpo-18756), but this change introduced new issues (issue bpo-21207)...

When I modified os.urandom(), I was aware that getrandom() can block at startup, but I saw this feature as a good thing for Python. It doesn't seem like a good idea to generate low quality random numbers. I expected that such system *quickly* gets enough good entropy. With this issue, we now have more information: "quickly" means in fact longer than 1 minute! ("causing a 90-second boot delay", msg265477).

Blocking Python startup longer than 1 minute just to get good quality random numbers doesn't seem worth it to me. It is clearly seen as a regression compared to Python 2 (which doesn't use getrandom() but reads /dev/urandom). I understand that the behaviour is also seen as a bug when compared to other programming languages or applications.

For all these reasons, I pushed Colm's change.

vstinner commented 8 years ago

Colm Buckley: "I feel that there is no consistent way to signal to higher-level applications that the random data has sub-standard entropy; but that this at least preserves the expected semantics, and doesn't block on startup in the event of an uninitialized entropy pool."

I chose to document the behaviour of os.urandom().

Stefan Krah (msg267539): "If admins wish, they can also integrate such checks into the system startup sequence (e.g. runlevel 3 is only reached if randomness is actually available)."

Maybe need something like time.get_clock_info(), sys.float_info and sys.thread_info for os.urandom(): a string describing the implementation of os.urandom(). It would allow the developer to decide what to do when getrandom() is not used.

Reminder: getrandom() feature is specific to Linux. I understand that all other operating systems don't warn if the urandom entropy pool is not initialized yet!

10da46c7-ab01-4256-82c4-e1aba4ee3583 commented 8 years ago

@haypo - I concur with all of your comments. I didn't have a strong opinion on whether to modify getrandom_works; your proposal looks fine to me (and will give consistent behavior over the lifetime of the process).

Thanks all for your help with this issue; much appreciated.

vstinner commented 8 years ago

Martin Panter (msg267504): "As I understand it, if there is no entropy initialized, this patch will fall back to reading /dev/urandom, which will return predictable data (opposite of “random” data!)."

No, I don't think so.

Linux uses a lot of random sources, but some of them are seen as untrusted as so are added with a very low estimation of their entropy. Linux even adds some random values with a estimation of 0 bit of entropy. For example, drivers can add serial numbers as random numbers.

So even if getrandom() blocks, if the urandom entropy pool is not considered as fully initialized yet, I expect that /dev/urandom still generates *random* numbers, even if these numbers are not suitable to generate cryptographic keys.

Please double check, I'm not sure of what I wrote :-)

See also http://www.2uo.de/myths-about-urandom/ (but this article doesn't describe how urandom is initialized).

vstinner commented 8 years ago

Martin Panter (msg267511): "Maybe an alternative would be to add a special PYTHONHASHSEED=best-effort (or whatever) value that says if there is no entropy available, use a predictable hash seed. That would force whoever starts the Python process to be aware of the problem."

In my experience, it's better if users don't touch security :-) It's better if Python simply makes the best choices regarding to security.

5531d0d8-2a9c-46ba-8b8b-ef76132a492c commented 8 years ago

On Tue, Jun 07, 2016 at 10:01:16AM +0000, STINNER Victor wrote:

Maybe need something like time.get_clock_info(), sys.float_info and sys.thread_info for os.urandom(): a string describing the implementation of os.urandom(). It would allow the developer to decide what to do when getrandom() is not used.

I think this is a very good idea. Just a flag for "cryptographically secure" or not.

alex commented 8 years ago

This doesn't look correct to me. Despite what the Linux maintainers insist, it's a _bug_ that /dev/urandom will return immediately if the system's entropy pool has never been seeded; one of the whole points of the getrandom syscall is that it has the correct behavior (which is the same behavior as BSDs).

IMO the patch landed this morning should be reverted and it should be left as is.

malemburg commented 8 years ago

On 07.06.2016 13:27, Alex Gaynor wrote:

This doesn't look correct to me. Despite what the Linux maintainers insist, it's a _bug_ that /dev/urandom will return immediately if the system's entropy pool has never been seeded; one of the whole points of the getrandom syscall is that it has the correct behavior (which is the same behavior as BSDs).

IMO the patch landed this morning should be reverted and it should be left as is.

I'm with Victor and the others on this one. Practicality beats purity.

vstinner commented 8 years ago

Alex: "IMO the patch landed this morning should be reverted and it should be left as is."

Sorry, but you should elaborate a little bit more, see my rationale: https://bugs.python.org/issue26839#msg267611

There are multiple issues.

vstinner commented 8 years ago

Stefan Krah: "I think this is a very good idea. Just a flag for "cryptographically secure" or not."

If you consider it is worth it, please open a new issue.

I dislike the idea of a boolean. The quality of each system RNG has been discussed long enough to be able to say that "cryptographically secure" term depends a lot of your use case, and experts disagree between themself :-) You might even have to consider the version of the Linux kernel to decide if /dev/urandom is good enough or not for you use case. The implementation changed last years.

6d5d82f3-c90b-41da-bb7f-abdec4dbac80 commented 8 years ago

This patch explicitly violates several of the documented constraints of the Python standard library.

For example, random.SystemRandom uses os.urandom to generate its random numbers. SystemRandom is then used by the secrets module to generate *its random numbers. This means that os.urandom *is explicitly used by the Python standard library to generate cryptographically secure random numbers. It was done so in part expressly because the call to random() could block.

If Python needs a non-blocking RNG for internal purposes, that's totally fine, a new function should be written that does exactly that. But any code that is calling secrets or random.SystemRandom is expecting the documented guarantees of that module: that is, that the security profile of the random numbers generated by those objects are cryptographically secure. This patch ensures that that guarantee is *violated* on Linux systems run on cloud servers, which is more than a little alarming to me.

dstufft commented 8 years ago

I agree with Alex here.

The documentation of os.urandom states: Return a string of n random bytes suitable for cryptographic use. However the old behavior prior to using the getrandom() call and the behavior with this patch makes that documentation a lie. It's now a string of n random bytes that may or may not be suitable for cryptographic use, but we have no idea which one it is.

No where in the documentation of os.urandom does it ever promise it will not block. In fact, on systems like FreeBSD where their /dev/urandom is better than Linuxes it always blocked on start up because that's just the way their /dev/urandom works.

dstufft commented 8 years ago

I also agree with Cory :) If CPython needs a non blocking RNG for start up, then it should add a new function that does that, breaking the promise of os.urandom cryptographically suitable random is not the way to do that.

95226e59-5ffc-4f16-ace8-c7dbbbb29bfd commented 8 years ago

The original problem is that Python wants to generate random numbers at *startup*. Are those random numbers really used for crypto-related activities? I doubt it.

So isn't the proper solution to have two functions, one delivering random numbers that are usable for crypto-related activities, and which would potentially block, and a second one that delivers random numbers that are not appropriate for crypto stuff. This second function can be used at Python startup to replace what is done currently.

It is most likely perfectly fine if Python blocks when explicitly asked to generate cryptographically secure random numbers. But not when simply starting the interpreter.

malemburg commented 8 years ago

On 07.06.2016 13:36, Donald Stufft wrote:

No where in the documentation of os.urandom does it ever promise it will not block. In fact, on systems like FreeBSD where their /dev/urandom is better than Linuxes it always blocked on start up because that's just the way their /dev/urandom works.

The whole purpose of urandom is to have a non-blocking source of random numbers:

http://man7.org/linux/man-pages/man4/random.4.html

and os.urandom() has always stated: "This function returns random bytes from an OS-specific randomness source. The returned data should be unpredictable enough for cryptographic applications, though its exact quality depends on the OS implementation."

That's pretty much in line with what the implementation now does. There's no promise on the quality of the data it returns.

python / cpython

Python 3.5 running on Linux kernel 3.17+ can block at startup or on importing the random module on getrandom() #71026