siderolabs / talos

Talos Linux is a modern Linux distribution built for Kubernetes.
https://www.talos.dev
Mozilla Public License 2.0
6.59k stars 525 forks source link

Security Guarantees of LUKS Encryption using TPM for User Data on EPHEMERAL #8972

Open stereobutter opened 3 months ago

stereobutter commented 3 months ago

Feature Request

Seal the LUKS encryption keys for the EPHEMERAL partition using a TPM register that depends on confidential information from the STATE partition.

Description

Consider a scenario where an attacker has physical access to a cluster (as opposed to access to a lone disk removed from the cluster). I'm currently unsure whether using TPM-based LUKS encryption for both the STATE and EPHEMERAL partitions is sufficient to guarantee that user data on EPHEMERAL will not be readable by the attacker.

The boot process, as I understand it using SecureBoot and LUKS with TPM-based encryption, looks roughly as follows:

  1. At system startup, UEFI is loaded. The UEFI firmware itself and its configuration are measured into PCR[0] and PCR[1].
  2. UEFI measures the bootloader and its configuration into PCR[4] and PCR[5]. If PCR measurements up until this point are as expected, the bootloader is loaded.
  3. The bootloader (systemd-boot) runs. The UKI content (includes the kernel, its config, the initrd, etc.) are measured into PCR[11].
  4. If the PCR measurements up to this point are as expected, the kernel and initrd are loaded.
  5. The kernel initializes hardware, etc., and may measure data into PCR[10].
  6. If the PCR measurements up to this point are as expected, the TPM releases the encryption key for the STATE partition. The partition is decrypted and mounted.
  7. The kernel executes init and talos takes over.
  8. Talos runs through its start-up phases and measures progress into PCR[11] and can read the machine config from the now decrypted STATE partition.
  9. If the PCR measurements up to this point are as expected, the TPM releases the encryption key for EPHEMERAL. The partition is decrypted and mounted as /var.

If I'm not mistaken, all PCR measurements up to step 9, where the encryption key for EPHEMERAL is released by the TPM, depend either on the physical device (CPU, TPM-Chip, etc.) or the current talos installer image used on the machine (Kernel, Bootloader, SecureBoot signature, etc.) and not on the identity of the machine/cluster. I believe this would enable an attacker to overwrite the STATE with their talos STATE partition (created by the attacker using the same installer image).

The issue, I think, is that during step 8, no machine- (and/or cluster-specific) information is measured into any of the PCRs, including PCR[11] which is currently used for LUKS in talos. Compare that to systemd-pcrmachine.service which measures the machine-id, a confidential identifier that is generated on the first boot, into PCR[15] (which could then be used for binding the LUKS encryption key for EPHEMERAL to).

Workaround

If the above is correct, I think a workaround for the time being is using TPM-based LUKS encryption just for the STATE partition and passphrase-based encryption for EPHEMERAL. The relevant part of the boot process would look like this:

  1. If the PCR measurements up until this point are as expected, the TPM releases the encryption key for the STATE partition. The partition is decrypted and mounted.
  2. The kernel executes init and talos takes over.
  3. Talos runs through its start-up phases and measures progress into PCR[11] and is able to read the machine config from the now decrypted STATE partition.
  4. The static passphrase is read from the machine config and used as the key for EPHEMERAL. The partition is decrypted and mounted as /var.

Overwriting the STATE partition to try and boot talos with a machine config under the control of the attacker would destroy the static passphrase and render EPHEMERAL inaccessible to the attacker.

Useful references

stereobutter commented 3 months ago

The machine config I'm using currently is

machine:
  systemDiskEncryption:
      ephemeral:
        cipher: aes-xts-plain64
        keySize: 256
        keys:
          - slot: 1
            static:
              passphrase: <your passphrase>
        provider: luks2
      state:
        cipher: aes-xts-plain64
        keySize: 256
        keys:
          - slot: 0
            tpm: {}
        provider: luks2