latchset / clevis

Automated Encryption Framework
GNU General Public License v3.0
916 stars 104 forks source link

Initramfs-tools unlocker + tang pin doesn't bring up network during boot on Ubuntu 18.04 #145

Closed bviktor closed 4 years ago

bviktor commented 4 years ago

After performing the bind to tang server and updating the initramfs, the boot splash is stuck on the encryption passphrase input.

If you press Esc, it says

No Network Connection [...] Please unlock disk sdb3_crypt

And won't proceed until a passphrase is manually entered. The comp is connected to the network via cable, not WiFi.

tang is installed from Ubuntu package, tpm2-tss (2.3.1), tpm2-tools (3.2.1) and clevis (9a41f41) are built from sources.

bviktor commented 4 years ago

I suspected that DKMS might have interfered with this, but nope. I removed the DKMS version of my Eth driver, rebuilt initramfs, and it produced the same result. Stuck on passphrase screen.

I even checked now that my eth driver is indeed included in the initramfs:

# lsinitramfs initrd.img-5.0.0-36-generic |grep -i igb
lib/modules/5.0.0-36-generic/kernel/drivers/net/ethernet/intel/igbvf
lib/modules/5.0.0-36-generic/kernel/drivers/net/ethernet/intel/igbvf/igbvf.ko
lib/modules/5.0.0-36-generic/kernel/drivers/net/ethernet/intel/igb
lib/modules/5.0.0-36-generic/kernel/drivers/net/ethernet/intel/igb/igb.ko
yuergen commented 4 years ago

My understanding is, that you need to give an "ip="-parameter to enable your network interfaces during the initramfs-phase. Configuring the ip-parameter inside the initramfs-tools configuration does not work for me. However using the grub boot parameter configuration in /etc/default/grub is successful. Using the grub kernel parameters also has the advantage, that you can remove the parameter by editing the boot commandline in the grub menu while booting the computer.

Let me know if your clevis instance is able to unlock the volumes. On my system it does not find the correct passphrase slot to use.

bviktor commented 4 years ago

Hardcoding network configuration into GRUB config files isn't really feasible in a corporate environment with several hundred or thousand workstations. And it's also redundant. You store your network configuration once in netplan/networkmanager/ifcfg, then you need to do that again in GRUB?

In any case, at least an ip=dhcp option might help. I'll need time to actually test if this makes tang unlock work, probably only next year. But I'm wondering if initramfs-tools hooks can do such a thing, without having to actually edit the /etc/default/grub.cfg file and running update-grub.

smokingroosters commented 4 years ago

I might be hitting a similar problem with machines with 2 network cards. Both the networks card are not initialised on time and I get:

enp4s0: link down enp4s0: link is not ready enp5s0: link down enp5s0: link is not ready [...] Please unlock disk sda3_crypt

The only way I could make it work is by adding a sleep after clevis is trying to bring up the interfaces:

# Check if network is up before trying to configure it.
eth_check() {
    for device in $(clevis_all_netbootable_devices); do
        ip link set dev "$device" up
        sleep 10
        ETH_HAS_CARRIER=$(cat /sys/class/net/"$device"/carrier)

and also add a always true to the configure_networking step

if eth_check; then
    # Make sure networking is set up: if booting via nfs, it already is
    # Doesn't seem to work when added to clevisloop for some reason
    [ "$boot" = nfs ] || configure_networking || true

Same configuration as OP

yuergen commented 4 years ago

@smokingroosters did you set the ip parameter for the linux kernel in /etc/default/grub.cfg ?

I did not need the sleep when I poked around. I needed the || true, though. On my system the deconfiguration of the network devices failed. So I ended up with network manager being unable to manage the network interfaces.

smokingroosters commented 4 years ago

@yuergen yes I set up the ip for dhcp as: ip=:::::<<ethernet_interface>>:dhcp

bviktor commented 4 years ago

Haha that's quite hilarious, @smokingroosters needed both the sleep increase and the || true, @yuergen only needed the || true, I only needed the increased sleep lol.

I don't even need the ip=dhcp in GRUB for it to work. And it works with an FQDN too. 3 seconds seems to be the threshold, that works, 2 seconds won't. I settled with 5 seconds for now.

It'd be great if Clevis was aware whether the TPM or Tang unlocker is in use, and only apply the sleep if it's Tang, since TPM doesn't need any networking.

bviktor commented 4 years ago

Now we found a machine where even 5 secs wasn't enough so I bumped it to 10 seconds.

tavril commented 4 years ago

Had the same issue. Excuse that naive question but why do we need the eth_check() at all ? There is already an helper function configure_networking and it feels not right that clevis tries to bring interface up by itself... I tried not to call that function and just have [ "$boot" = nfs ] || configure_networking and it worked.

yuergen commented 4 years ago

It depends on the network hardware if the amount of sleep is enough for the card and switch having brought up a link. I suggest to change the code of the local-top/clevis script to something like this:

# Check if network is up before trying to configure it.
eth_check() {
    for device in $(clevis_all_netbootable_devices); do
        ip link set dev "$device" up
    done
    counter=20
    while [ $counter -gt 0 ]; do
        for device in $(clevis_all_netbootable_devices); do
            ETH_HAS_CARRIER=$(cat /sys/class/net/"$device"/carrier)
            if [ "$ETH_HAS_CARRIER" = '1' ]; then
                return 0
            fi
        done
        counter=$(( $counter - 1 ))
        sleep .5
    done
    return 1
}

This piece of code uses short sleep 0.5 second intervals between checking if ONE networking interface has a carrier. This process is repeated for 10 seconds.

sergio-correia commented 4 years ago

Had the same issue. Excuse that naive question but why do we need the eth_check() at all ? There is already an helper function configure_networking and it feels not right that clevis tries to bring interface up by itself... I tried not to call that function and just have [ "$boot" = nfs ] || configure_networking and it worked.

Yeah, it looks strange, and it seems like something that configure_networking should take care of. Unfortunately, that does not seem to be the case.

In my test VMs, I didn't need eth_check either, but in the laptop I am using to test it (using a usb to ethernet device), it did not work, displaying messages like "ipconfig: no devices to configure".

In fact, even @yuergen's solution above (also from #182) is not enough, and what worked for me was to make it even uglier, like looping a max of 20 times, like @yuergen, but with the current code we have in eth_check, i.e., with the sleep 1 afer ip link set....

dannf commented 4 years ago

Interesting. IMO, this seems like something we should try to get fixed in configure_networking() in Ubuntu, but it isn't clear to me what the problem is exactly. @sergio-correia: would you be willing to file a bug against initramfs-tools in Ubuntu for this, ideally with a console log (which i appreciate can be difficult to capture from a laptop)? It sounds like something similar to LP: #682445 - but I've verified that was fixed years ago - and the fix is still in-tact in 18.04.

dannf commented 4 years ago

I tried to reproduce @sergio-correia 's case using a USB NIC. However, I had the opposite result. That is, it failed when calling eth_check, but if I comment out eth_check so that it always calls configure_networking, it works fine. fyi, I am passing the ip=::::::dhcp param because I have multiple NICs. I recommend that anyone with multiple NICs do this. See my explanation in PR #193 .

It seems like configure_networking provides plenty of time for an interface to become ready (~4 minutes, if you sum all of the backoff sleeps). I wonder if there's a scenario where the driver would not have registered an interface before configure_networking? If anyone is able to reproduce a scenario where calling eth_check seems to be required, I'd like to work with you to try and root cause this.

sergio-correia commented 4 years ago

@dannf: in the next days I will reinstall debian in the test laptop and will post the updates here. We can then file a bug, if required.

sergio-correia commented 4 years ago

@dannf: so I installed again Debian 10.3 in my laptop.

This is the USB NIC I am using: Bus 001 Device 002: ID 0bda:8153 Realtek Semiconductor Corp. RTL8153 Gigabit Ethernet Adapter and I have added usbnet and r8152 to /etc/initramfs-tools/modules (please let me know if I am doing it wrong) I am not passing any ip arguments.

Using clevis-13 (0bea5c4f6):

Jun 19 13:31:08.344910 debian kernel: sd 0:0:0:0: [sda] Attached SCSI disk
Begin: Loading essential drivers ... Jun 19 13:31:08.344950 debian kernel: usbcore: registered new interface driver r8152
done.
Begin: Running /scripts/init-premount ... done.
Begin: Mounting root file system ... Begin: Running /scripts/local-top ...   Volume group "debian-vg" not found
  Cannot process volume group debian-vg
Jun 19 13:31:08.345440 debian kernel: usb 1-3: New USB device found, idVendor=0bda, idProduct=8153, bcdDevice=30.00
Jun 19 13:31:08.345886 debian kernel: usb 1-3: New USB device strings: Mfr=1, Product=2, SerialNumber=6
Jun 19 13:31:08.346440 debian kernel: usb 1-3: Product: USB 10/100/1000 LAN
Jun 19 13:31:08.346983 debian kernel: usb 1-3: Manufacturer: Realtek
Jun 19 13:31:08.347492 debian kernel: usb 1-3: SerialNumber: 000001
Jun 19 13:31:08.347534 debian kernel: device-mapper: uevent: version 1.0.3
Jun 19 13:31:08.347568 debian kernel: device-mapper: ioctl: 4.39.0-ioctl (2018-04-03) initialised: dm-devel@redhat.com
Jun 19 13:31:08.348070 debian kernel: psmouse serio4: elantech: assuming hardware version 4 (with firmware version 0x381f00)
Jun 19 13:31:08.348554 debian kernel: psmouse serio4: elantech: Synaptics capabilities query result 0x10, 0x14, 0x0e.
Jun 19 13:31:08.349026 debian kernel: psmouse serio4: elantech: Elan sample query result 00, 23, 64
Jun 19 13:31:08.349671 debian kernel: usb 1-4: new full-speed USB device number 3 using xhci_hcd
Jun 19 13:31:08.349722 debian kernel: input: ETPS/2 Elantech Touchpad as /devices/platform/i8042/serio4/input/input8
Jun 19 13:31:08.350258 debian kernel: usb 1-4: New USB device found, idVendor=8087, idProduct=0a2a, bcdDevice= 0.01
Jun 19 13:31:08.350784 debian kernel: usb 1-4: New USB device strings: Mfr=0, Product=0, SerialNumber=0
Jun 19 13:31:08.351316 debian kernel: usb 1-3: reset high-speed USB device number 2 using xhci_hcd
Jun 19 13:31:08.351929 debian kernel: r8152 1-3:1.0 eth0: v1.09.9
Jun 19 13:31:08.352473 debian kernel: r8152 1-3:1.0 enxa0cec80e9fe1: renamed from eth0

Please unlock disk sda3_crypt:

Not calling eth_check, just having [ "$boot" = nfs ] || configure_networking:

Jun 19 13:55:40.759846 debian kernel: sd 0:0:0:0: [sda] Attached SCSI disk
Begin: Loading essential drivers ... Jun 19 13:31:08.344950 debian kernel: usbcore: registered new interface driver r8152
done.
Begin: Running /scripts/init-premount ... done.
Begin: Mounting root file system ... Begin: Running /scripts/local-top ... ipconfig: no devices to configure
ipconfig: no devices to configure
ipconfig: no devices to configure
ipconfig: no devices to configure
ipconfig: no devices to configure
ipconfig: no devices to configure
ipconfig: no devices to configure
ipconfig: no devices to configure
ipconfig: no devices to configure
ipconfig: no devices to configure
/script/functions: line 279: /run/net-*.conf: No such file or directory
Volume group "debian-vg" not found
  Cannot process volume group debian-vg
Jun 19 13:55:40.760333 debian kernel: usb 1-3: New USB device found, idVendor=0bda, idProduct=8153, bcdDevice=30.00
Jun 19 13:55:40.760776 debian kernel: usb 1-3: New USB device strings: Mfr=1, Product=2, SerialNumber=6
Jun 19 13:55:40.761219 debian kernel: usb 1-3: Product: USB 10/100/1000 LAN
Jun 19 13:55:40.761669 debian kernel: usb 1-3: Manufacturer: Realtek
Jun 19 13:55:40.762119 debian kernel: usb 1-3: SerialNumber: 000001
Jun 19 13:55:40.762546 debian kernel: psmouse serio4: elantech: assuming hardware version 4 (with firmware version 0x381f00)
Jun 19 13:55:40.762579 debian kernel: device-mapper: uevent: version 1.0.3
Jun 19 13:55:40.763000 debian kernel: psmouse serio4: elantech: Synaptics capabilities query result 0x10, 0x14, 0x0e.
Jun 19 13:55:40.763028 debian kernel: device-mapper: ioctl: 4.39.0-ioctl (2018-04-03) initialised: dm-devel@redhat.com
Jun 19 13:55:40.763511 debian kernel: psmouse serio4: elantech: Elan sample query result 00, 23, 64
Jun 19 13:55:40.764013 debian kernel: usb 1-4: new full-speed USB device number 3 using xhci_hcd
Jun 19 13:55:40.764046 debian kernel: input: ETPS/2 Elantech Touchpad as /devices/platform/i8042/serio4/input/input8
Jun 19 13:55:40.764494 debian kernel: usb 1-4: New USB device found, idVendor=8087, idProduct=0a2a, bcdDevice= 0.01
Jun 19 13:55:40.764937 debian kernel: usb 1-4: New USB device strings: Mfr=0, Product=0, SerialNumber=0
Jun 19 13:55:40.765384 debian kernel: usb 1-3: reset high-speed USB device number 2 using xhci_hcd
Jun 19 13:55:40.765854 debian kernel: r8152 1-3:1.0 eth0: v1.09.9
Jun 19 13:55:40.766315 debian kernel: r8152 1-3:1.0 enxa0cec80e9fe1: renamed from eth0

Please unlock disk sda3_crypt:

In either cases it won't proceed to unlock and I need to type the passphrase manually. Any ideas?

sergio-correia commented 4 years ago

@dannf: while checking configure_networking, I noticed that the call wait_for_udev 10 returns immediately; apparently udev event queue is empty. After adding a sleep before the wait_for_udev call, it managed to configure my NIC and the unlocking worked as expected. I couldn't find a definitive value for the sleep, but sleep .5 seems to be working here.

This is without eth_check, having just [ "$boot" = nfs ] || configure_networking

dannf commented 4 years ago

@sergio-correia thanks for testing! Interesting, I expected events to remain in the queue until all interfaces were enumerated, but clearly that's not happening.

sergio-correia commented 4 years ago

@sergio-correia thanks for testing! Interesting, I expected events to remain in the queue until all interfaces were enumerated, but clearly that's not happening.

Yeah, this is interesting. What do you suggest here? Do we file a bug with initramfs-tools?

One workaround that worked for me was calling udevadm trigger before calling configure_networking (or within it, before calling udevadm settle)

dannf commented 4 years ago

On Fri, Jul 3, 2020 at 7:59 AM Sergio Correia notifications@github.com wrote:

@sergio-correia https://github.com/sergio-correia thanks for testing! Interesting, I expected events to remain in the queue until all interfaces were enumerated, but clearly that's not happening.

Yeah, this is interesting. What do you suggest here? Do we file a bug with initramfs-tools?

Sorry for the delay in responding.

Yes, I think a bug with initramfs-tools will ultimately be needed. I plan to try to instrument an initramfs w/ my USB NIC (similar model to yours) and see if I can reproduce at least the condition of the interface not yet being enumerated after settle, even if I can't reproduce the exact symptom. I've got a few things on my TODO list ahead of that, but I hope to get back to it before the end of the week.

I was going to do some more debugging before filing a bug myself, but if you'd like to get one logged sooner, please just add a link here so I can update that one.

-dann

One workaround that worked for me was calling udevadm trigger before calling configure_networking (or within it, before calling udevadm settle)

jpds commented 4 years ago

@cbiedl This also requires fixing in the Debian packaging.

pszypowicz commented 4 years ago

I just wanted to confirm the problem on the ubuntu 20.04. It worked for me in the VM (on top of hyperv) with 1 interface.

But on standard installation on the AsRock Rack X470D4U with 2 Intel I210 interfaces (only one is up) it did not worked for me by default.

$ lspci -nk 
[...]
23:00.0 0200: 8086:1533 (rev 03)
        Subsystem: 1849:1533
        Kernel driver in use: igb
        Kernel modules: igb
24:00.0 0200: 8086:1533 (rev 03)
        Subsystem: 1849:1533
        Kernel driver in use: igb
        Kernel modules: igb
[...]

I had to modify the script /usr/share/initramfs-tools/scripts/local-top/clevis and extend sleep to 10 in the eth_check function update initramfs with: update-initramfs -u.

Installed packages

$ dpkg -l | grep clevis
ii  clevis                               12-1ubuntu2.1                         amd64        automated encryption framework
ii  clevis-initramfs                     12-1ubuntu2.1                         all          Clevis initramfs integration
ii  clevis-luks                          12-1ubuntu2.1                         all          LUKSv1 integration for clevis
dannf commented 4 years ago

@sergio-correia I have a new setup now (different NIC - same USB IDs as yours actually, passed through into a VM), and I am able to reproduce, albeit only intermittently. I am passing an ip=

[    2.634882] hid-generic 0003:0627:0001.0001: input,hidraw0: USB HID v0.01 Mouse [QEMU QEMU USB Tablet] on usb-0000:02:00.0-1/input0
[    2.747616] usb 1-4: new high-speed USB device number 3 using xhci_hcd
Begin: Loading essential drivers ... [    2.855366] usbcore: registered new interface driver r8152
done.
Begin: Running /scripts/init-premount ... done.
Begin: Mounting root file system ... Begin: Running /scripts/local-top ... [    2.866037] input: ImExPS/2 Generic Explorer Mouse as /devices/platform/i8042/serio1/input/input2
ipconfig: enx00249b194b8e: SIOCGIFINDEX: No such device
ipconfig: no devices to configure
ipconfig: enx00249b194b8e: SIOCGIFINDEX: No such device
ipconfig: no devices to configure
[    2.922523] usb 1-4: New USB device found, idVendor=0bda, idProduct=8153, bcdDevice=30.00
[    2.925599] usb 1-4: New USB device strings: Mfr=1, Product=2, SerialNumber=6
[    2.929000] usb 1-4: Product: USB 10/100/1000 LAN
[    2.930444] usb 1-4: Manufacturer: Realtek
[    2.931638] usb 1-4: SerialNumber: 00002A
ipconfig: enx00249b194b8e: SIOCGIFINDEX: No such device
ipconfig: no devices to configure
ipconfig: enx00249b194b8e: SIOCGIFINDEX: No such device
ipconfig: no devices to configure
ipconfig: enx00249b194b8e: SIOCGIFINDEX: No such device
ipconfig: no devices to configure
ipconfig: enx00249b194b8e: SIOCGIFINDEX: No such device
ipconfig: no devices to configure
ipconfig: enx00249b194b8e: SIOCGIFINDEX: No such device
ipconfig: no devices to configure
ipconfig: enx00249b194b8e: SIOCGIFINDEX: No such device
ipconfig: no devices to configure
ipconfig: enx00249b194b8e: SIOCGIFINDEX: No such device
ipconfig: no devices to configure
ipconfig: enx00249b194b8e: SIOCGIFINDEX: No such device
ipconfig: no devices to configure
/scripts/functions: line 275: /run/net-enx00249b194b8e.conf: No such file or directory
  Volume group "debian-vg" not found
  Cannot process volume group debian-vg
  Volume group "debian-vg" not found
  Cannot process volume group debian-vg
[    3.283564] usb 1-4: reset high-speed USB device number 3 using xhci_hcd
[    3.307687] device-mapper: uevent: version 1.0.3
[    3.311047] device-mapper: ioctl: 4.39.0-ioctl (2018-04-03) initialised: dm-devel@redhat.com
[    3.543395] r8152 1-4:1.0 eth0: v1.09.9
[    3.566459] r8152 1-4:1.0 enx00249b194b8e: renamed from eth0
Please unlock disk vda5_crypt:
dannf commented 4 years ago

It sounds like there is just no guarantee that all devices that are plugged in at boot time will be available when udevadm settle exits. I've reported a Debian bug where I propose an alternate approach:

https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=965935

It would be good to have some feedback on how that works for others here. The idea is that we disable call to eth_check in clevis, and instead patch initramfs-tools' configure_networking with the patch at: https://salsa.debian.org/kernel-team/initramfs-tools/-/merge_requests/32

Update: Note that this method requires specifying an ip= kernel command line parameter. See https://github.com/latchset/clevis/pull/193 for details.