openSUSE / sdbootutil

MIT License
25 stars 12 forks source link

Discussion: setup pcr-oracle/measured boot with installation #83

Open TobiPeterG opened 5 months ago

TobiPeterG commented 5 months ago

While sdbootutil is able to update the predictions for measured boot, it doesn't setup pcr-oracle and enrols the keys to Luks when the bootloader is installed. Should sdbootutil also handle this? If not, what other tool should?

aplanas commented 5 months ago

As today is done by en jeos enroll module from https://github.com/openSUSE/disk-encryption-tool

But yes, I am adding a "re-enroll" feature to sdbootutil, that will be used to re-do the enrollment on devices where the current policy is broken. This will lead naturally on doing the enrollment in this tool

lnussel commented 5 months ago

We can't get rid of the bits in jeos-fistboot though. I was actually experimenting with reusing the same code. The dialog aliases are more or less compatible.

aplanas commented 5 months ago

imo sdbootutil should not contain, any dialog and the [re-]enrollment should be redesigned for a cli-only approach. I am not sure how to do that for fido2, as ideally should be detected and enrolled automatically.

TobiPeterG commented 5 months ago

Should the "install" command also cause a re-enroll? This way, a new GUI entry isn't required.

aplanas commented 5 months ago

No the "install" is about the boot loader (selecting the shim.efi, the systemd or grub efi and renaming it, etc). Is a copy of bootctl install aware of shim (and grub)

TobiPeterG commented 5 months ago

No the "install" is about the boot loader (selecting the shim.efi, the systemd or grub efi and renaming it, etc). Is a copy of bootctl install aware of shim (and grub)

Hmm that is currently the case. Is there a reason why it has to stay that way in the future?

aplanas commented 5 months ago

Is there a reason why it has to stay that way in the future?

I think so. The operation of installing and updating the boot loader is not related with the enrollment. Every time the boot loader gets updated, this function is called but there is no need to do an enrollment (just update predictions)

Also you can install the boot loader without any enrollment, as it can be done at a different point

TobiPeterG commented 3 months ago

I experimented with TPM2 a bit today and found a way to get predictions working with fde-tools. However, pcr-oracle doesn't work for me on Tumbleweed, I'm not quite sure why. What I did and didn't work: pcr-oracle:

  1. pcr-oracle --rsa-generate-key --private-key /etc/systemd/tpm2-pcr-private-key.pem --public-key /etc/systemd/tpm2-pcr-public-key.pem store-public-key
  2. systemd-cryptenroll --wipe-slot=tpm2 "$dev" (replacing $dev with my actual device)
  3. systemd-cryptenroll --tpm2-device=auto --tpm2-public-key=/etc/systemd/tpm2-pcr-public-key.pem --tpm2-public-key-pcrs="0,2,7,9" "$dev" (replacing $dev with my actual device)
  4. sdbootutil mkinitrd

I could see that the token has been created with luksDump However, on next boot, journalctl always showed me this: https://pastebin.com/1iz2qVgh

I don't know what the issue here is, I tried different things to solve this, but nothing worked. So, I tried option 2: pcrlock. And this worked:

  1. systemd-cryptenroll --wipe-slot=tpm2 "$dev" (replacing $dev with my actual device)
  2. sdbootutil mkinitrd
  3. systemd-cryptenroll --tpm2-device=auto --tpm2-pcrlock=/var/lib/systemd/pcrlock.json "$dev" (replacing $dev with my actual device)

And that was actually it. Now, my drive auto unlocks on boot without asking for a password. I don't know if the pcr-oracle issue is a bug in systemd or pcr-oracle and I don't know which of these procedures is more secure. I guess these 3 steps could be baked into sdbootutil, though I had to find out that I can't enroll the pcr-oracle json when a pcrlock file exists. Everything else should be easily doable :)

What is your opinion on that?

EDIT: Which components are all measured? It seems that the command line doesn't seem to be, is that right?

Btw. Are predicted values removed at one point or could initrds and kernels from a year ago still be accepted?

TobiPeterG commented 3 months ago

Also, systemd-pcrlock now auto encrypts the json and drops it in a special folder on the esp that gets automatically copied to the initrd: https://www.freedesktop.org/software/systemd/man/latest/systemd-pcrlock.html (see make-policy Could this potentially replace the dracut-pcr-signature module? Also, this would encrypt the json, so pretty convenient I guess

aplanas commented 3 months ago

I experimented with TPM2 a bit today and found a way to get predictions working with fde-tools.

We should not use fde-tools. It is specific to pcr-oracle and grub2. I try to update it but it is not as trivial as I expected. But maybe we can revisit it.

However, pcr-oracle doesn't work for me on Tumbleweed, I'm not quite sure why. What I did and didn't work: pcr-oracle:

1. pcr-oracle --rsa-generate-key --private-key /etc/systemd/tpm2-pcr-private-key.pem --public-key /etc/systemd/tpm2-pcr-public-key.pem store-public-key

2. systemd-cryptenroll --wipe-slot=tpm2 "$dev" (replacing $dev with my actual device)

3. systemd-cryptenroll --tpm2-device=auto --tpm2-public-key=/etc/systemd/tpm2-pcr-public-key.pem --tpm2-public-key-pcrs="0,2,7,9" "$dev" (replacing $dev with my actual device)

4. sdbootutil mkinitrd

You cannot include it in the initrd, note that you are predicting the measurement of initrd (pcr#9) but you change it later when creating a new initrd to include the file.

To resolve that we have dracut-pcr-signature that will load the file from the ESP during boot time, and place it in the initram /etc before systemd-cryptsetup required it.

The process is described here: https://en.opensuse.org/Systemd-fde and here: https://news.opensuse.org/2023/12/20/systemd-fde/

Everything else should be easily doable :)

Yes, IMHO the enrollment in sdbootutil should copy this: https://github.com/openSUSE/disk-encryption-tool/blob/master/jeos-firstboot-enroll#L146 and this: https://github.com/openSUSE/disk-encryption-tool/blob/master/jeos-firstboot-enroll#L165

But also should include FIDO2, as in the disk-encryption-tool.

EDIT: Which components are all measured? It seems that the command line doesn't seem to be, is that right?

cmdline is measured in both models, pcrlock and pcr-oracle.

Btw. Are predicted values removed at one point or could initrds and kernels from a year ago still be accepted?

As far as nothing change, old predictions still work. This is a problem, as you can make a backup of the ESP in the system, wait until a CVE appears, restore the ESP and exploit it. That is why pcrlock is better, as the real policy is stored in the TPM2, and you cannot make a backup of it.

aplanas commented 3 months ago

Also, systemd-pcrlock now auto encrypts the json and drops it in a special folder on the esp that gets automatically copied to the initrd: https://www.freedesktop.org/software/systemd/man/latest/systemd-pcrlock.html (see make-policy Could this potentially replace the dracut-pcr-signature module? Also, this would encrypt the json, so pretty convenient I guess

Sadly the credentials are delivered to the kernel only if systemd-stub is used (UKI). We can research it, as I would like to get rid of dracut-pcr-signature. Use the systemd-stub requires the use of UKIs (type#2 entries), but we are using type#1. Attaching the stub in the kernel (that already has another stub) is not clear for me.

TobiPeterG commented 3 months ago

I experimented with TPM2 a bit today and found a way to get predictions working with fde-tools.

We should not use fde-tools. It is specific to pcr-oracle and grub2. I try to update it but it is not as trivial as I expected. But maybe we can revisit it.

Oh alright. :) Since it was in sdbootutil and the jeos-firstboot module, I thought it was alright to use.

However, pcr-oracle doesn't work for me on Tumbleweed, I'm not quite sure why. What I did and didn't work: pcr-oracle:

1. pcr-oracle --rsa-generate-key --private-key /etc/systemd/tpm2-pcr-private-key.pem --public-key /etc/systemd/tpm2-pcr-public-key.pem store-public-key

2. systemd-cryptenroll --wipe-slot=tpm2 "$dev" (replacing $dev with my actual device)

3. systemd-cryptenroll --tpm2-device=auto --tpm2-public-key=/etc/systemd/tpm2-pcr-public-key.pem --tpm2-public-key-pcrs="0,2,7,9" "$dev" (replacing $dev with my actual device)

4. sdbootutil mkinitrd

You cannot include it in the initrd, note that you are predicting the measurement of initrd (pcr#9) but you change it later when creating a new initrd to include the file.

To resolve that we have dracut-pcr-signature that will load the file from the ESP during boot time, and place it in the initram /etc before systemd-cryptsetup required it.

Oh in which step did I include it in the initrd? I just copied the commands from the jeos module

The process is described here: https://en.opensuse.org/Systemd-fde and here: https://news.opensuse.org/2023/12/20/systemd-fde/

Everything else should be easily doable :)

Yes, IMHO the enrollment in sdbootutil should copy this: https://github.com/openSUSE/disk-encryption-tool/blob/master/jeos-firstboot-enroll#L146 and this: https://github.com/openSUSE/disk-encryption-tool/blob/master/jeos-firstboot-enroll#L165

But also should include FIDO2, as in the disk-encryption-tool.

Yes, that would be useful :) Do we want to support setting a custom password and PIN like in the module? If yes, we would have to rewrite the Dialogs for a cli approach.

EDIT: Which components are all measured? It seems that the command line doesn't seem to be, is that right?

cmdline is measured in both models, pcrlock and pcr-oracle.

Hmm interesting. I also saw the command line in the logs, but I could change it and my system still unlocked.

Btw. Are predicted values removed at one point or could initrds and kernels from a year ago still be accepted?

As far as nothing change, old predictions still work. This is a problem, as you can make a backup of the ESP in the system, wait until a CVE appears, restore the ESP and exploit it. That is why pcrlock is better, as the real policy is stored in the TPM2, and you cannot make a backup of it.

Good to know. :) However, I don't quite get why that wouldn't allow this attack vector tbh. The policy is stored in the tpm, however, a backup of the esp could still be made. How would restoring the backup not allow the old files to be unlocked?

TobiPeterG commented 3 months ago

Also, systemd-pcrlock now auto encrypts the json and drops it in a special folder on the esp that gets automatically copied to the initrd: https://www.freedesktop.org/software/systemd/man/latest/systemd-pcrlock.html (see make-policy Could this potentially replace the dracut-pcr-signature module? Also, this would encrypt the json, so pretty convenient I guess

Sadly the credentials are delivered to the kernel only if systemd-stub is used (UKI). We can research it, as I would like to get rid of dracut-pcr-signature. Use the systemd-stub requires the use of UKIs (type#2 entries), but we are using type#1. Attaching the stub in the kernel (that already has another stub) is not clear for me.

Oh I read that the credentials are either derived from the tpm or an extra credentials file is created, but maybe I misunderstood that.

Also, do you have experience with NixOS? In one of their projects, Lanzaboote, they use UKIs, but they just include the kernel, the initrd is still an extra file. They can't use systemd-stub for that reason, so they have their own stub but afaik, it already has most of the systemd-stub features and they are looking to upstream it. Would this solution also be suitable for openSUSE, or do we want to stick to upstream here?

aplanas commented 3 months ago

We should not use fde-tools. It is specific to pcr-oracle and grub2. I try to update it but it is not as trivial as I expected. But maybe we can revisit it.

Oh alright. :) Since it was in sdbootutil and the jeos-firstboot module, I thought it was alright to use.

You mean the variable name? It is indeed confusing. I should change it.

Oh in which step did I include it in the initrd? I just copied the commands from the jeos module

Calling sdbootutil mkinitrd? At the very least generating a new initrd will invalidate its measurement.

But also should include FIDO2, as in the disk-encryption-tool.

Yes, that would be useful :) Do we want to support setting a custom password and PIN like in the module? If yes, we would have to rewrite the Dialogs for a cli approach.

I would love to get rid off the dialogs and use a full cli approach. The features (PIN, FIDO2, recovery ...) should be the same that we have in disk-encryption-tool, because ideally

cmdline is measured in both models, pcrlock and pcr-oracle.

Hmm interesting. I also saw the command line in the logs, but I could change it and my system still unlocked.

I wonder if the enrollment failed then. Note that system-cryptenroll will not complain too much when a PCR cannot be aligned with the event log when using pcrlock. It is very tricky, and you can see this issue when using SYSTEMD_LOG_LEVEL=debug. If the PCR is not aligned, it will be discarded but the enrollment will continue. Another way of seeing this is doing a systemd-pcrlock, and be sure that all the events of the tracked PCR has a corresponding component file.

To alleviate this there is a PR (not sure that it is merged, I will add it in my TODO), to --force the tracking of the enumerated PCRs, and if one fails, complain and abort the enrollment.

Good to know. :) However, I don't quite get why that wouldn't allow this attack vector tbh. The policy is stored in the tpm, however, a backup of the esp could still be made. How would restoring the backup not allow the old files to be unlocked?

Right. It is the "rollback attack", I think that is named like that. You need to have physical access to system, but imagine that I want to access to your big server, and I have non privilege account there. I can make a copy of the ESP, this include the boot loader, the kernel, initrd and the signed policy (tpm2-prediction.json).

After some time you keep updating your server, the old kernel is not there anymore in the ESP, but I read that a new CVE is present in original old kernel that you are not using anymore. With my backup I could restore the bad kernel, with the initrd and the policy in the system, and still successfully unlock the LUKS2 device. The system now is running a kernel with a known CVE that I can exploit with my unprivileged account.

Then the policy is in the TPM, my backup cannot include this policy, so when restoring the kernel the automatic unlock of the LUKS2 device will not succeed.

aplanas commented 3 months ago

Oh I read that the credentials are either derived from the tpm or an extra credentials file is created, but maybe I misunderstood that.

Credentials can be encrypted using the host key (a long 4kb random file), a TPM2, a combination of both or none. In the case of pcrlock.json, I think it is using the null key.

But the relevant part is the delivery mechanism, as you noted. dracut-pcr-signature is the delivery mechanism that mounts the ESP. Using Type#1 entries cannot use a delivery mechanism based on systemd-creds, because the component that move the credential into the initrd in memory is systemd-stub, that is used in the UKI case.

Also, do you have experience with NixOS? In one of their projects, Lanzaboote, they use UKIs, but they just include the kernel, the initrd is still an extra file. They can't use systemd-stub for that reason,

Interesting, but in this case is not an UKI ... The kernel has its own stub when you are using EFI. This makes the kernel a normal EFI PE binary, that the boot loader can execute.

so they have their own stub but afaik, it already has most of the systemd-stub features and they are looking to upstream it. Would this solution also be suitable for openSUSE, or do we want to stick to upstream here?

I need to check it.

But I wonder if makes more sense to go for UKIs at the end, I do not know. The initrd can be separated in two components, one that is general and other that is host specific. There is some work here: https://github.com/openSUSE/sdbootutil/pull/63

Also @lnussel managed to generate UKIs in OBS to experiment.

TobiPeterG commented 3 months ago

We should not use fde-tools. It is specific to pcr-oracle and grub2. I try to update it but it is not as trivial as I expected. But maybe we can revisit it.

Oh alright. :) Since it was in sdbootutil and the jeos-firstboot module, I thought it was alright to use.

You mean the variable name? It is indeed confusing. I should change it.

Oh in which step did I include it in the initrd? I just copied the commands from the jeos module

Calling sdbootutil mkinitrd? At the very least generating a new initrd will invalidate its measurement.

But sdbootutil first creates the initrd and then measures it, so the measurements should be correct or am I missing something?

But also should include FIDO2, as in the disk-encryption-tool.

Yes, that would be useful :) Do we want to support setting a custom password and PIN like in the module? If yes, we would have to rewrite the Dialogs for a cli approach.

I would love to get rid off the dialogs and use a full cli approach. The features (PIN, FIDO2, recovery ...) should be the same that we have in disk-encryption-tool, because ideally

cmdline is measured in both models, pcrlock and pcr-oracle.

Hmm interesting. I also saw the command line in the logs, but I could change it and my system still unlocked.

I wonder if the enrollment failed then. Note that system-cryptenroll will not complain too much when a PCR cannot be aligned with the event log when using pcrlock. It is very tricky, and you can see this issue when using SYSTEMD_LOG_LEVEL=debug. If the PCR is not aligned, it will be discarded but the enrollment will continue. Another way of seeing this is doing a systemd-pcrlock, and be sure that all the events of the tracked PCR has a corresponding component file.

Oh that's good to know. :)

To alleviate this there is a PR (not sure that it is merged, I will add it in my TODO), to --force the tracking of the enumerated PCRs, and if one fails, complain and abort the enrollment.

Good to know. :) However, I don't quite get why that wouldn't allow this attack vector tbh. The policy is stored in the tpm, however, a backup of the esp could still be made. How would restoring the backup not allow the old files to be unlocked?

Right. It is the "rollback attack", I think that is named like that. You need to have physical access to system, but imagine that I want to access to your big server, and I have non privilege account there. I can make a copy of the ESP, this include the boot loader, the kernel, initrd and the signed policy (tpm2-prediction.json).

After some time you keep updating your server, the old kernel is not there anymore in the ESP, but I read that a new CVE is present in original old kernel that you are not using anymore. With my backup I could restore the bad kernel, with the initrd and the policy in the system, and still successfully unlock the LUKS2 device. The system now is running a kernel with a known CVE that I can exploit with my unprivileged account.

Then the policy is in the TPM, my backup cannot include this policy, so when restoring the kernel the automatic unlock of the LUKS2 device will not succeed.

Ah alright, thanks for the clarification :)

aplanas commented 3 months ago

But sdbootutil first creates the initrd and then measures it, so the measurements should be correct or am I missing something?

Maybe I am in the wrong here. In the steps that you wrote the last one is generate the initrd. When are you making the predictions?

TobiPeterG commented 3 months ago

But sdbootutil first creates the initrd and then measures it, so the measurements should be correct or am I missing something?

Maybe I am in the wrong here. In the steps that you wrote the last one is generate the initrd. When are you making the predictions?

sdbootutil automatically makes the prediction after generating an initrd/installing a kernel

aplanas commented 3 months ago

Oh my ... I see it now, sorry.

If you run pcr-oracle -d .... will do the predictions and will show a lot of debug output. One of the elements that shows is the value used to extend the PCR. You can compare it with the current value used from the event log, to localize if it is miss-predicting.