osresearch / safeboot

Scripts to slightly improve the security of the Linux boot process with UEFI Secure Boot and TPM support
https://safeboot.dev/
GNU General Public License v2.0
270 stars 28 forks source link

sip-init: unable to move everything from old mount #91

Open sidhussmann opened 3 years ago

sidhussmann commented 3 years ago

IMHO sip-init does not handle non-empty /home dirs correctly and fails fatally:

$ sudo safeboot sip-init          
sip-init: Checking configuration before writing...
     _                             _
  __| | __ _ _ __   __ _  ___ _ __| |
 / _` |/ _` | '_ \ / _` |/ _ \ '__| |
| (_| | (_| | | | | (_| |  __/ |  |_|
 \__,_|\__,_|_| |_|\__, |\___|_|  (_)
===================|___/=============

This will make irrevocable changes to the disk!
You must have the security key or password for
the UEFI Platform Key to procede.

Are you really really sure? y
WARNING: Do you have the security key? y
/dev/mapper/vgubuntu-root: remounting read-write
/dev/mapper/vgubuntu-root: Remounted read/write
  Logical volume "hashes" created.
  Logical volume "var" created.
  Logical volume "home" created.
mke2fs 1.45.5 (07-Jan-2020)
Discarding device blocks: done                            
Creating filesystem with 4194304 4k blocks and 1048576 inodes
Filesystem UUID: 075f6291-7fa7-44bc-abf0-e2951bc6d5fd
Superblock backups stored on blocks: 
        32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208, 
        4096000

Allocating group tables: done                            
Writing inode tables: done                            
Creating journal (32768 blocks): done
Writing superblocks and filesystem accounting information: done   

Adding /var to fstab
Moving contents of /var to new filesystem
mke2fs 1.45.5 (07-Jan-2020)
Discarding device blocks: done                            
Creating filesystem with 20971520 4k blocks and 5242880 inodes
Filesystem UUID: 2affd08e-18c8-4d37-b98e-2514669cc05d
Superblock backups stored on blocks: 
        32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208, 
        4096000, 7962624, 11239424, 20480000

Allocating group tables: done                            
Writing inode tables: done                            
Creating journal (131072 blocks): done
Writing superblocks and filesystem accounting information: done   

Adding /home to fstab
Moving contents of /home to new filesystem
mv: cannot remove '/home/sid': Directory not empty

********************************************************
***** THIS IS BAD: Your system might be broken now *****
********************************************************

Please contact the developers with a report of what happened.
You can try to look back through the log messages to see if
you can unwind the changes that were made.
vgubuntu-home: unable to move everything from old mount
osresearch commented 3 years ago

Was this on a running system or in recovery mode boot? The mv as root should have relocated all of the files and the /home/sid directory should have been empty afterwards. I'm wondering if a running process recreated some file in the home directory after it was moved away.

sidhussmann commented 3 years ago

Just to make sure I didn't confuse things: When I boot into recovery shell and then exit/ctrl+d I'm still in recovery mode right?

Because that's what I did before starting safeboot sip-init. It could have been that I had a vim session open for a file in /home/sid... That would maybe explain why it wasn't empty. However, in this scenario we either lose the temp file (commit 9a6ed5d ) or we have a broken system. Out of these two evils I believe the former is the lesser one.

osresearch commented 3 years ago

There is the single-user recovery shell, which is the safe time to run things that manipulate the filesystems since there are literally no other processes (the shell is pid 1). When you exit that, it resumes booting, still in recovery mode, although the other processes make things riskier.

We need to verify that safeboot sip-init works in the recovery shell; there might be some issues with the absolute paths for /home and /var.

And yes, leaving a stray file or two behind it definitely much safer. They are still recoverable if necessary.

sidhussmann commented 3 years ago

Thank you for explaining. I will verify if safeboot sip-init works from the recovery shell. I will keep you posted.

For what it's worth, I'm testing with two devices: Thinkpad X260 and T490. With neither have I managed to get to a stage where the sealed key gets released by the TPM and unlocks Luks automatically, yet. I will debug further.

[1] https://safeboot.dev/install

sidhussmann commented 3 years ago

Should all this following block be done from within recovery shell?

# Should reboot into the recovery image. Login as usual.
sudo safeboot luks-seal
sudo update-initramfs -u
sudo safeboot sip-init # if you want to enable SIP mode
sudo safeboot recovery-sign
sudo safeboot recovery-reboot

Because from within the recovery shell there is no /etc/initramfs-tools directory nor update-initramfs binary...

hpc4fun commented 3 years ago

Yes it shall be done from within the recovery boot, maybe whats confusing are the steps prior to this. I have tried to add a few more details than you find in the TL;DR section. The lines below sufficed for me (I'm using the deb package as I ran into some issues with the package that one builds from the master branch at github):

# During install, prior to first reboot:
e2fsck -f /dev/vgkubuntu/root
resize2fs -p /dev/vgkubuntu/root 12G
lvreduce -L 12G /dev/vgkubuntu/root

Then the first block (verify the safeboot recovery-sign does the right thing and watch out for disk full on /boot/efi/EFI):

safeboot yubikey-init /CN=test/
update-initramfs -u
safeboot uefi-sign-keys
# if old leftovers - in general I had issues with space on /boot/efi
rm -fr /boot/efi/EFI/linux
rm -fr /boot/efi/EFI/recovery
safeboot recovery-sign # verify with efibootmgr -v and ensure proper device, verify linux efi image in /boot/efi/EFI/recovery
safeboot recovery-reboot # fails to unmount / as its mounted rw at this point of time
reboot

I had problems with junk from previous attempts in the /boot/efi/EFI dirs and with the device which was wrongly detected. This implied that it didn't boot the recovery image at first and that confused me at first and then I simply hardcoded the device in the safeboot script. Finally (note the cwd before the sip-init)

# Should reboot into the recovery image again, with `/` mounted read-only.
ctrl-D
sudo su
safeboot luks-seal
update-initramfs -u
cd / # :) sip-init will mv /home so better safe than sorry
safeboot sip-init # if you want to enable SIP mode
safeboot recovery-sign
safeboot recovery-reboot # fails to unmount / as its mounted rw at this point of time
reboot

Hope this helps!

sidhussmann commented 3 years ago

Thank you @hpc4fun for trying to help me out. And my apologies taking this much time for getting back to you.

I tried following permutations:

Here my findings:

Thinkpad X260

The TPM2.0 of the X260 only supports SHA1 for its PCRs. At the current state safeboot only supports SHA2. I will try to patch this once I find some time.

Thinkpad T490

With the T490 I'm a bit lost... @osresearch mentioned that it should work out of the box as he tested that model as well. I noticed during boot an firmware error message regarding the TPM [2].

verify with efibootmgr -v and ensure proper device, verify linux efi image in /boot/efi/EFI/recovery

I did that and it seemed right. However, I wasn't quite sure what to look for. How do I verify linux efi image? Did you mean the signature?

The first luks-seal does report some YubiKey and TPM related errors however, they don't seem fatal as sealing the key didn't seem fail and which returned with /dev/nvme0n1p3: sealed with PCR 0,2,4,5,7,14:

[sudo] password for sid: 
Starting PCSCD for yubikey support
00000000 [140415219189696] utils.c:81:GetDaemonPid() Can't open /run/pcscd/pcscd.pid: No such file or directory
New unsealing PIN: 03125757 [140415219189696] ccid_usb.c:1264:ControlUSB() control failed (1/2): -7 LIBUSB_ERROR_TIMEOUT
00011281 [140415219189696] ifdhandler.c:150:CreateChannelByNameOrChannel() failed
00000037 [140415219189696] readerfactory.c:1105:RFInitializeReader() Open Port 0x200001 Failed (usb:1050/0407:libudev:0:/dev/bus/usb/001/003)
00000012 [140415219189696] readerfactory.c:376:RFAddReader() Yubico YubiKey OTP+FIDO+CCID init failed.
00002299 [140415219189696] ifdhandler.c:150:CreateChannelByNameOrChannel() failed
00000009 [140415219189696] readerfactory.c:1105:RFInitializeReader() Open Port 0x200001 Failed (usb:1050/0407:libudev:1:/dev/bus/usb/001/003)
00000004 [140415219189696] readerfactory.c:376:RFAddReader() Yubico YubiKey OTP+FIDO+CCID init failed.

Unsealing PIN again: 
WARNING:esys:src/tss2-esys/api/Esys_NV_ReadPublic.c:309:Esys_NV_ReadPublic_Finish() Received TPM Error 
ERROR:esys:src/tss2-esys/esys_tr.c:210:Esys_TR_FromTPMPublic_Finish() Error NV_ReadPublic ErrorCode (0x0000018b) 
ERROR:esys:src/tss2-esys/esys_tr.c:321:Esys_TR_FromTPMPublic() Error TR FromTPMPublic ErrorCode (0x0000018b) 
ERROR: Esys_TR_FromTPMPublic(0x18B) - tpm:handle(1):the handle is not correct for the use
ERROR: Failed to read the public part of NV index 0x1500010
ERROR: Unable to run nvundefine
Unable to remove old TPM counter 0x1500010
Using placeholder TPM counter version
Sealing secret with TPM, storing sealed secret in 0x1500010
WARNING:esys:src/tss2-esys/api/Esys_ReadPublic.c:320:Esys_ReadPublic_Finish() Received TPM Error 
ERROR:esys:src/tss2-esys/esys_tr.c:231:Esys_TR_FromTPMPublic_Finish() Error ReadPublic ErrorCode (0x0000018b) 
ERROR:esys:src/tss2-esys/esys_tr.c:321:Esys_TR_FromTPMPublic() Error TR FromTPMPublic ErrorCode (0x0000018b) 
ERROR: Esys_TR_FromTPMPublic(0x18B) - tpm:handle(1):the handle is not correct for the use
ERROR:esys:src/tss2-esys/esys_tr.c:357:Esys_TR_Close() Error: Esys handle does not exist (70018). 
ERROR: Esys_TR_Close(0x70018) - esapi:The ESYS_TR resource object is bad
ERROR: Unable to run evictcontrol
Unable to evict existing sealed handle 0x81110000, ignoring
persistent-handle: 0x81110000
action: persisted
adding crypttab unseal script
/dev/nvme0n1p3: Current recovery password: 
Removing old LUKS TPM key (if it exists)
Keyslot 1 is not active.
/dev/nvme0n1p3: Unable to remove old key slot (ignored)
Adding new LUKS TPM key
/dev/nvme0n1p3: sealed with PCR 0,2,4,5,7,14
-------- Need to update initramfs --------
update-initramfs: Generating /boot/initrd.img-5.4.0-54-generic
I: The initramfs will attempt to resume from /dev/dm-2
I: (/dev/mapper/vgubuntu-swap_1)
I: Set the RESUME variable to override this.
-------- Need to sign new kernel --------
SIP mode is not enabled
/boot/efi/EFI/linux: Creating directory on EFI System Partition
/boot/efi/EFI/linux: Creating boot menu item on /dev/nvme0n1p1
Kernel commandline: 'ro quiet splash vt.handoff=7 intel_iommu=on efi=disable_early_pci_dma lockdown=confidentiality                     root=/dev/mapper/vgubuntu-root
                 safeboot.mode=linux'
/tmp/tmp.SzD9EirX8w/linux.efi: Creating merged Linux/initrd image
/boot/efi/EFI/linux/linux.efi: Signing (ignore warnings about gaps)
warning: data remaining[108928512 vs 108938124]: gaps between PE/COFF sections?
warning: data remaining[108928512 vs 108938128]: gaps between PE/COFF sections?
Enter engine key pass phrase:
Enter PKCS#11 key PIN for SIGN key:
Signing Unsigned original image
ff4ae8e22ffe94cf2e3c58474e131e5085c88973ae5eded12762fa6f66e70e66  /boot/efi/EFI/linux/linux.efi
-------- Need to sign PCR and counter values --------
/boot/efi/EFI/linux/linux.efi: TPM version 000000000000000d
sha256:
  0 : 0x384BA8DCE6EBE9A704E8D7876CF40CA3DE8BD6271D3EA1491446D8A7681E98D0
  2 : 0x3D458CFE55CC03EA1F443F1562BEEC8DF51C75E14A9FCF9A7234A13F198E7969
  4 : 0x8C7A2D920894BCC8CDBB37ADD40E5EA81159602885D1F39975BEFF19D6703D2D
  5 : 0xE76FC73922B96E40AEC13B468868B4E54232316D20EA3FCB648E1D3047040E86
  7 : 0xC11E126AFA559B04BD0A926B3E7AE7354BDF8A0EDADA2A9AD3DA7B765B248AED
warning: data remaining[108930056 vs 108939672]: gaps between PE/COFF sections?
/boot/efi/EFI/linux/linux.efi: PE hash a6a5c13c88fe515c5544e5bab8923c9798d0428ecd21b14f1ede0f3d4df8bdd0
PCR4 670411bc6aef2d25ac96320bc0e1d49e4915a74edb4778de060577ad0e9ffecf
mode=linux PCR14=4cc49932dc91c7021e5cfee8231a5f2da3ac1de6df7a7aeff333cc8dfd230f28
final PCRs:
384ba8dce6ebe9a704e8d7876cf40ca3de8bd6271d3ea1491446d8a7681e98d0
3d458cfe55cc03ea1f443f1562beec8df51c75e14a9fcf9a7234a13f198e7969
670411bc6aef2d25ac96320bc0e1d49e4915a74edb4778de060577ad0e9ffecf
e76fc73922b96e40aec13b468868b4e54232316d20ea3fcb648e1d3047040e86
c11e126afa559b04bd0a926b3e7ae7354bdf8a0edada2a9ad3da7b765b248aed
4cc49932dc91c7021e5cfee8231a5f2da3ac1de6df7a7aeff333cc8dfd230f28
Using TPM counter 000000000000000d
engine "pkcs11" set.
Enter PKCS#11 token PIN for test.gapfruit.com:
Enter PKCS#11 key PIN for SIGN key:
/sys/firmware/efi/efivars/SafebootPCR-8620893e-c793-457e-8a02-41fc83eef3ce: writing new value
/tmp/tmp.SzD9EirX8w: Unmounting

Misc

I figure that the dm-verity root hash of the merkle-tree should be part of the kernel params. However, looking at the safeboot configs, it ain't:

Kernel commandline: 'ro lockdown=none root=/dev/mapper/vgubuntu-root  safeboot.mode=linux'

Like you I noticed some bumps regarding the TL;TR as well. Let me summarize:

  1. safeboot yubikey-init needs to be called before update-initramfs -u, because of the initramfs hooks that expect certificates in /etc/safeboot.
  2. The issue regarding disk full on /boot/efi/EFI. Addressed in https://github.com/osresearch/safeboot/pull/92/commits/9b80dfc27f237103fde872a2c2fe29d6f2fa3000
  3. A typo on the master branch in the sip-init function. Addressed in https://github.com/osresearch/safeboot/pull/92/commits/265aa680d2f9d74b0ce40a6ab62f4586947824ab
  4. Failed sip-init when having temp files open in /home during mv. Addressed in https://github.com/osresearch/safeboot/pull/92/commits/2c26efb1b950b49dc63b4cd376d1f621ed0a020a

@osresearch this discussion is starting to become unrelated to the issue description. Should I split my findings into multiple issues?

[1] https://github.com/osresearch/safeboot/pull/92 [2] TPM error during boot:

tpm-error-t490

hpc4fun commented 3 years ago

No worries! A few quick comments:

1) Admitted, it was sloppy to just write "verify" without explaining what kind of issues I have had here. What I meant was simply:

a) check that 'efibootmgr -v' show the same disk uuid for ALL targets - I have consistently seen that the first time I do it I get the wrong uuid for the new recovery target (and if I boot with the wrong uuid then I'm not booting into recovery but into the original ubuntu and I'm won't be able to continue with the proper assumptions so things will start to fail. If your secure boot settings are correct you will not boot into ubuntu but see something like 'operation system loader failed signature verification' when doing reboot with wrong uuid) . As a work-around I hardcoded my device in /usr/sbin/safeboot where the efibootmgr command is and ran it twice and then I got consistent UUID for all 3 targets:

b) check that the linux image in /boot/efi/EFI/recovery is present (and complete so not suffering from the disk full situation that I have seen too)

Once you boot you should enter a recovery image again and should type 'ctrl-D' to continue to proceed.

2) Your sealing looks wrong - it should not make all these complains so I guess this is the first issue that you should try to resolve and maybe it explains why you are having trouble with the sip-init too.

@sidhussmann, @osresearch I can try to make a detailed description of the process from A-Z based on kubuntu20.04 with some logging so one can see what to expect. I strongly support the ideas in this project and could try to help. Also there are a few places in the script where we could add some assumption checking prior to execution to try to catch issues at an early stage (and not warn about issues that are predictable :) under valid assumptions - e.g. like umount of / when mounted rw)

osresearch commented 3 years ago

That TPM error during boot looks like it is going to cause all sorts of issues... Have you used fwupd to make sure that you have the latest Lenovo firmware? I know that some (early) X1 devices shipped with broken ACPI tables for the TPM, leading to errors talking to it from Linux.

Yes, @sidhussmann, let's split this into the separate issues.