saltstack / salt

Software to automate the management and configuration of any infrastructure or application at scale. Get access to the Salt software package repository here:
https://repo.saltproject.io/
Apache License 2.0
14.1k stars 5.47k forks source link

virt: libvirt: configure virtio rng #53087

Open ewenmcneill opened 5 years ago

ewenmcneill commented 5 years ago

Description of Issue/Question

Modern Linux (Debian Buster / Unstable, etc) is very slow to start services that depend on randomness (eg, ssh) if the random number generator takes a while to initialise. In particular (a) those services are typically trying to start before the random number generator has been re-seeded, and (b) at least by default the re-seeding of the random number generator doesn't contribute to counted entropy (which leads to the random number generator still waiting for "real" randomness before it will return results).

For virtual machines, the best solution to this is for the hypervisor to provide a virtual random number device. For instance, with libvirt it is possible to do this with something like:

    <rng model='virtio'>
      <rate bytes='192' period='300000'/>
      <backend model='random'>/dev/random</backend>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/>
    </rng>

(For more details see https://libvirt.org/formatdomain.html#elementsRng)

As far as I can tell neither salt.states.virt.running nor salt.modules.virt.init currently provide any way to pass this type of <rng model='virtio'> configuration through to libvirt, resulting in virtual machines where the randomness is slow to fill, andssh` is not answering for many seconds after the virtual machine starts:

ewen@ashram:~$ ssh 172.20.2.64
ssh: connect to host 172.20.2.64 port 22: Connection refused
ewen@ashram:~$ 

(Note that the VM is up, as it's replying that it is refusing the connection not just hanging; it's just that 30+ second after the VM booted it still hasn't got enough randomness to allow ssh to start. That VM is running Debian Unstable, but from other reading I've done I'd expect Debian Buster, recent Ubuntu, etc, to all be the same.)

Arguably the virtio rng model should probably be configured automatically / by default for libvirt these days, maybe pointed at /dev/unradom instead of /dev/random. But at minimum there should be some way to ensure that this can be part of the libvirt VM definition -- either by explicit parameters, or maybe by providing an XML fragment to include in the libvirt definition that is generated.

Versions Report

ewen@noc:~$ salt --versions-report
Salt Version:
           Salt: 2019.2.0

Dependency Versions:
           cffi: 0.8.6
       cherrypy: Not Installed
       dateutil: 2.2
      docker-py: Not Installed
          gitdb: 0.5.4
      gitpython: 0.3.2 RC1
          ioflo: Not Installed
         Jinja2: 2.9.4
        libgit2: Not Installed
        libnacl: Not Installed
       M2Crypto: Not Installed
           Mako: Not Installed
   msgpack-pure: Not Installed
 msgpack-python: 0.4.2
   mysql-python: Not Installed
      pycparser: 2.10
       pycrypto: 2.6.1
   pycryptodome: Not Installed
         pygit2: Not Installed
         Python: 2.7.9 (default, Sep 25 2018, 23:32:58)
   python-gnupg: Not Installed
         PyYAML: 3.11
          PyZMQ: 14.4.0
           RAET: Not Installed
          smmap: 0.8.2
        timelib: Not Installed
        Tornado: 4.4.3
            ZMQ: 4.0.5

System Versions:
           dist: debian 8.11 
         locale: ANSI_X3.4-1968
        machine: i686
        release: 4.9.0-0.bpo.9-686-pae
         system: Linux
        version: debian 8.11 

ewen@noc:~$
ewenmcneill commented 5 years ago

FTR, this is what it looks like from inside the VM when it's waiting on randomness before it can fully start ssh:

ewen@debian-unstable:~$ ps ax | grep ssh
  267 ?        Ss     0:00 /usr/sbin/sshd -t
  290 ttyS0    S+     0:00 grep ssh
ewen@debian-unstable:~$ cat /proc/sys/kernel/random/entropy_avail
52
ewen@debian-unstable:~$ ps ax | grep ssh
  267 ?        Ss     0:00 /usr/sbin/sshd -t
  293 ttyS0    S+     0:00 grep ssh
ewen@debian-unstable:~$ 

and there's nothing listening on TCP/22. Ie, it's stuck in the "sshd -tphase, where-t` is the config test stage:

     -t      Test mode.  Only check the validity of the configuration file and
             sanity of the keys.  This is useful for updating sshd reliably as
             configuration options may change.

due to the minimal randomness.

Eventually after the randomness comes online, the sshd -t succeeds, and ssh can start properly:

ewen@debian-unstable:~$ ps ax | grep ssh
  305 ?        Ss     0:00 /usr/sbin/sshd -D
  308 ttyS0    S+     0:00 grep ssh
ewen@debian-unstable:~$ 

and then, eg, ssh to the VM will work. But as noted above, this can be 30-60 seconds for an otherwise idle (eg, test) VM.

Ewen

ewenmcneill commented 5 years ago

Also FTR, this is what it looks like inside the VM when there is a rng virtio device provided:

ewen@debian-unstable:~$ sudo dmesg | grep rng
[    0.095228] random: get_random_bytes called from start_kernel+0x93/0x52c with crng_init=0
[    1.958468] random: crng init done
ewen@debian-unstable:~$ cat /proc/sys/kernel/random/entropy_avail
771
ewen@debian-unstable:~$ uptime
 16:39:51 up 0 min,  1 user,  load average: 0.60, 0.17, 0.06
ewen@debian-unstable:~$ ps ax | grep ssh
  272 ?        Ss     0:00 /usr/sbin/sshd -D
  309 ttyS0    S+     0:00 grep ssh
ewen@debian-unstable:~$ 

Note how in under a minute (well under 30 seconds), there is plenty of randomness, and sshd is running in daemon mode and it's possible to ssh into the VM. (There's actually enough randomness for the crng to be read in about 2 seconds after initial boot, instead of 30-90 seconds.)

Ewen

ewenmcneill commented 5 years ago

In case it helps anyone else, for now I've hacked my minion template for libvirt_domain to just write out the values that I want for the rng virtio device. This works (only when the VM is first defined), but it'd probably be helpful to others if it was configurable. (Obvious things to configure are the rate in bytes, over what time period, and whether it's from /dev/random or /dev/urandom.)

When the VM is deployed I get something like:

root@naosr620:~# grep -A 4 '<rng' /etc/libvirt/qemu/debian_unstable.xml
    <rng model='virtio'>
      <rate bytes='192' period='300000'/>
      <backend model='random'>/dev/random</backend>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/>
    </rng>
root@naosr620:~# 

(and I can ssh into the VM almost immediately -- see https://github.com/saltstack/salt/issues/53087#issuecomment-493317323).

Ewen

PS: Patch against 2019.2, change on the minion, and then sudo service salt-minion restart before deploying a new VM with virt.running.

ewen@naosr620:~$ diff -u /usr/lib/python2.7/dist-packages/salt/templates/virt/libvirt_domain.jinja.old /usr/lib/python2.7/dist-packages/salt/templates/virt/libvirt_domain.jinja
--- /usr/lib/python2.7/dist-packages/salt/templates/virt/libvirt_domain.jinja.old   2019-02-16 19:13:46.000000000 +1300
+++ /usr/lib/python2.7/dist-packages/salt/templates/virt/libvirt_domain.jinja   2019-05-17 16:35:21.636845068 +1200
@@ -81,6 +81,13 @@
                 {% endif %}
                 {% endif %}

+                {# 2019-05-17: inject rng virtio module #}
+                {# See: https://github.com/saltstack/salt/issues/53087 #}
+                <rng model='virtio'>
+                       <rate bytes='192' period='300000'/>
+                       <backend model='random'>/dev/random</backend>
+                </rng>
+
         </devices>
         <features>
                 <acpi />
ewen@naosr620:~$ 
garethgreenaway commented 5 years ago

@ewenmcneill Good find! This looks like a great start to fixing this issue, it should definitely be configurable, eg. a true/false value passed along to the template defaulting to false that includes the addition that you made to the template above. Additionally the random device could be configurable with an option as well, taking a default perhaps. Would you be able to submit a PR with the changes?

ewenmcneill commented 5 years ago

Definitely a feature request :-) It is one that's likely to become more urgent in the next 6 months or so, as I've been watching others running into these "limited randomness" issues particularly with ssh startup, for about 6 months now (and that's when I first hit it in my test Debian Unstable VM).

I'll put creating a PR for this on my (long!) todo list, but it might be some weeks before I get to look at it (among other things I have a bunch of VMs to get off old servers onto newer servers soon, which is how I found the issue). Happy if someone else wants to do it first :-)

For future reference, I suspect config like:

      - rng:
            source: /dev/urandom
            bits: 192
            interval: 300000     # ms
            model: virtio

is probably a reasonable virt.running state config snippet. And if passed through to the libvirt_domain.jinja template could configure something suitable. It'd probably also be useful to have defaults, at least for libvirt that are something like those.

Ewen

stale[bot] commented 4 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

If this issue is closed prematurely, please leave a comment and we will gladly reopen the issue.

ewenmcneill commented 4 years ago

Commenting to keep issue open; yes I'd still like the ability to configure virtio RNG for VMs, as it's needed to ensure quick startup of, eg, ssh on boot (and thus quick boot for most VMs). (And it's still on my "to do when I have time" list if no one else does it first.)

Ewen

stale[bot] commented 4 years ago

Thank you for updating this issue. It is no longer marked as stale.

cbosdo commented 4 years ago

@ewenmcneill Feel free to submit a pull request exposing parameters for rng to init(), update() in the virt module and to the virt.defined and virt.running state if it becomes urgent to you.

ewenmcneill commented 4 years ago

@ewenmcneill Feel free to submit a pull request exposing parameters for rng to init(), update() in the virt module and to the virt.defined and virt.running state if it becomes urgent to you.

Thanks. For now I'm still on 2019.2 (so basically frozen in time), with custom patches (as earlier in this issue) for this feature. I should probably get my environment updated to 3001 (which means getting rid of the last non-recent-Python-3 minions first) before trying to make a useful pull request. But I'll try to come back to this after that.

Ewen