Open ewenmcneill opened 5 years ago
FTR, this is what it looks like from inside the VM when it's waiting on randomness before it can fully start ssh
:
ewen@debian-unstable:~$ ps ax | grep ssh
267 ? Ss 0:00 /usr/sbin/sshd -t
290 ttyS0 S+ 0:00 grep ssh
ewen@debian-unstable:~$ cat /proc/sys/kernel/random/entropy_avail
52
ewen@debian-unstable:~$ ps ax | grep ssh
267 ? Ss 0:00 /usr/sbin/sshd -t
293 ttyS0 S+ 0:00 grep ssh
ewen@debian-unstable:~$
and there's nothing listening on TCP/22
. Ie, it's stuck in the "sshd -tphase, where
-t` is the config test stage:
-t Test mode. Only check the validity of the configuration file and
sanity of the keys. This is useful for updating sshd reliably as
configuration options may change.
due to the minimal randomness.
Eventually after the randomness comes online, the sshd -t
succeeds, and ssh
can start properly:
ewen@debian-unstable:~$ ps ax | grep ssh
305 ? Ss 0:00 /usr/sbin/sshd -D
308 ttyS0 S+ 0:00 grep ssh
ewen@debian-unstable:~$
and then, eg, ssh to the VM will work. But as noted above, this can be 30-60 seconds for an otherwise idle (eg, test) VM.
Ewen
Also FTR, this is what it looks like inside the VM when there is a rng
virtio
device provided:
ewen@debian-unstable:~$ sudo dmesg | grep rng
[ 0.095228] random: get_random_bytes called from start_kernel+0x93/0x52c with crng_init=0
[ 1.958468] random: crng init done
ewen@debian-unstable:~$ cat /proc/sys/kernel/random/entropy_avail
771
ewen@debian-unstable:~$ uptime
16:39:51 up 0 min, 1 user, load average: 0.60, 0.17, 0.06
ewen@debian-unstable:~$ ps ax | grep ssh
272 ? Ss 0:00 /usr/sbin/sshd -D
309 ttyS0 S+ 0:00 grep ssh
ewen@debian-unstable:~$
Note how in under a minute (well under 30 seconds), there is plenty of randomness, and sshd
is running in daemon mode and it's possible to ssh
into the VM. (There's actually enough randomness for the crng
to be read in about 2 seconds after initial boot, instead of 30-90 seconds.)
Ewen
In case it helps anyone else, for now I've hacked my minion template for libvirt_domain
to just write out the values that I want for the rng
virtio
device. This works (only when the VM is first defined), but it'd probably be helpful to others if it was configurable. (Obvious things to configure are the rate in bytes, over what time period, and whether it's from /dev/random
or /dev/urandom
.)
When the VM is deployed I get something like:
root@naosr620:~# grep -A 4 '<rng' /etc/libvirt/qemu/debian_unstable.xml
<rng model='virtio'>
<rate bytes='192' period='300000'/>
<backend model='random'>/dev/random</backend>
<address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/>
</rng>
root@naosr620:~#
(and I can ssh
into the VM almost immediately -- see https://github.com/saltstack/salt/issues/53087#issuecomment-493317323).
Ewen
PS: Patch against 2019.2, change on the minion, and then sudo service salt-minion restart
before deploying a new VM with virt.running
.
ewen@naosr620:~$ diff -u /usr/lib/python2.7/dist-packages/salt/templates/virt/libvirt_domain.jinja.old /usr/lib/python2.7/dist-packages/salt/templates/virt/libvirt_domain.jinja
--- /usr/lib/python2.7/dist-packages/salt/templates/virt/libvirt_domain.jinja.old 2019-02-16 19:13:46.000000000 +1300
+++ /usr/lib/python2.7/dist-packages/salt/templates/virt/libvirt_domain.jinja 2019-05-17 16:35:21.636845068 +1200
@@ -81,6 +81,13 @@
{% endif %}
{% endif %}
+ {# 2019-05-17: inject rng virtio module #}
+ {# See: https://github.com/saltstack/salt/issues/53087 #}
+ <rng model='virtio'>
+ <rate bytes='192' period='300000'/>
+ <backend model='random'>/dev/random</backend>
+ </rng>
+
</devices>
<features>
<acpi />
ewen@naosr620:~$
@ewenmcneill Good find! This looks like a great start to fixing this issue, it should definitely be configurable, eg. a true/false value passed along to the template defaulting to false that includes the addition that you made to the template above. Additionally the random device could be configurable with an option as well, taking a default perhaps. Would you be able to submit a PR with the changes?
Definitely a feature request :-) It is one that's likely to become more urgent in the next 6 months or so, as I've been watching others running into these "limited randomness" issues particularly with ssh
startup, for about 6 months now (and that's when I first hit it in my test Debian Unstable VM).
I'll put creating a PR for this on my (long!) todo list, but it might be some weeks before I get to look at it (among other things I have a bunch of VMs to get off old servers onto newer servers soon, which is how I found the issue). Happy if someone else wants to do it first :-)
For future reference, I suspect config like:
- rng:
source: /dev/urandom
bits: 192
interval: 300000 # ms
model: virtio
is probably a reasonable virt.running
state config snippet. And if passed through to the libvirt_domain.jinja
template could configure something suitable. It'd probably also be useful to have defaults, at least for libvirt
that are something like those.
Ewen
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
If this issue is closed prematurely, please leave a comment and we will gladly reopen the issue.
Commenting to keep issue open; yes I'd still like the ability to configure virtio RNG for VMs, as it's needed to ensure quick startup of, eg, ssh on boot (and thus quick boot for most VMs). (And it's still on my "to do when I have time" list if no one else does it first.)
Ewen
Thank you for updating this issue. It is no longer marked as stale.
@ewenmcneill Feel free to submit a pull request exposing parameters for rng to init()
, update()
in the virt
module and to the virt.defined
and virt.running
state if it becomes urgent to you.
@ewenmcneill Feel free to submit a pull request exposing parameters for rng to
init()
,update()
in thevirt
module and to thevirt.defined
andvirt.running
state if it becomes urgent to you.
Thanks. For now I'm still on 2019.2 (so basically frozen in time), with custom patches (as earlier in this issue) for this feature. I should probably get my environment updated to 3001 (which means getting rid of the last non-recent-Python-3 minions first) before trying to make a useful pull request. But I'll try to come back to this after that.
Ewen
Description of Issue/Question
Modern Linux (Debian Buster / Unstable, etc) is very slow to start services that depend on randomness (eg,
ssh
) if the random number generator takes a while to initialise. In particular (a) those services are typically trying to start before the random number generator has been re-seeded, and (b) at least by default the re-seeding of the random number generator doesn't contribute to counted entropy (which leads to the random number generator still waiting for "real" randomness before it will return results).For virtual machines, the best solution to this is for the hypervisor to provide a virtual random number device. For instance, with libvirt it is possible to do this with something like:
(For more details see https://libvirt.org/formatdomain.html#elementsRng)
As far as I can tell neither
salt.states.virt.running
norsalt.modules.virt.init
currently provide any way to pass this type of<rng model='virtio'>
configuration through tolibvirt, resulting in virtual machines where the randomness is slow to fill, and
ssh` is not answering for many seconds after the virtual machine starts:(Note that the VM is up, as it's replying that it is refusing the connection not just hanging; it's just that 30+ second after the VM booted it still hasn't got enough randomness to allow
ssh
to start. That VM is running Debian Unstable, but from other reading I've done I'd expect Debian Buster, recent Ubuntu, etc, to all be the same.)Arguably the
virtio
rng
model should probably be configured automatically / by default forlibvirt
these days, maybe pointed at/dev/unradom
instead of/dev/random
. But at minimum there should be some way to ensure that this can be part of thelibvirt
VM definition -- either by explicit parameters, or maybe by providing an XML fragment to include in thelibvirt
definition that is generated.Versions Report