toradex / meta-toradex-security

MIT License
3 stars 8 forks source link

Encrypted partition key fails after blowing SEC_BOOT fuse on iMX8MP #39

Open MMG-DG opened 4 months ago

MMG-DG commented 4 months ago

I am using an image based off the tdx-reference-minimal-image. I am using a pinned revision of this repo at c3123e14a2a01106b3c46a836211533efb380e17. I am setting TDX_ENC_PRESERVE_DATA = "1" in my local.conf. Once the image is flashed onto a SoM (iMX8MP v1.1A or v1.1B) everything looks fine. I blow the SRK HASH fuses and reboot.... still working fine I blow the following fuses to set boot config and disable some debug ports:

# FORCE_BOOT_FROM_FUSE, ROM_NO_LOG
fuse prog -y 2 0 0x00500000
# SDP
fuse prog -y 2 0 0x00200000
# BT_FUSE_SEL, JTAG, SJC, BOOT_MODE
fuse prog -y 1 3 0x10E02000

Reboot... everything is still good. Finally I blow the SEC_BOOT fuse:

# SEC_BOOT
fuse prog -y 1 3 0x02000000

Reboot... everything starts up, but my encryption key is now invalid:

Jan 01 00:00:03 verdin-imx8mp tdx-enc.sh[272]: CAAM: Encrypted key exists. Importing it...
Jan 01 00:00:03 verdin-imx8mp tdx-enc.sh[277]: add_key: Bad message
Jan 01 00:00:03 verdin-imx8mp tdx-enc.sh[272]: CAAM: ERROR: Error adding key to kernel keyring!

I have managed to restore the encrypted partition, but is this something that should be considered / noted before blowing that fuse? Maybe there should be a decrypt script to enable reversing the process before blowing that fuse such that the next boot will have an unencrypted partition to re-encrypt correctly?

sergioprado commented 3 months ago

@MMG-DG this is expected.

When the device is not closed, a test key from the CAAM backend is used to encrypt the encryption key used to encrypt/decrypt the partition. When the device is closed, the OTPMK key is used (a never-disclosed 256-bit key randomly generated and fused into each SoC at manufacturing time). This is documented in https://www.kernel.org/doc/html/latest/security/keys/trusted-encrypted.html.

Because you encrypted the key with a test key, that key cannot be decrypted with the OTPMK key after the device is closed. So a production device should always create the encrypted partition from an unencrypted partition at first boot.

I am about to extend the documentation about encryption to add information about the TPM support we are introducing, and I will make sure to state that in the docs.

Thanks for reporting.

MMG-DG commented 3 months ago

I wonder if it might be useful to include another option that allows for this to be delayed?

Based on other documentation, one should apply certificate fuses and then check everything boots without warnings and issues before blowing the final fuse. If this is the case then to do a first-boot encryption is probably not the wanted thing?

Maybe if a delayed-encryption setting is configured then an extra file is deployed into the image to block the encryption? This would instead just mount the raw drive. Once a user is ready and has blown the required fuse they can then delete this file from the rootfs (or other specified writable partition) and then the handler service can perform the encryption at the next boot.

I would guess this just needs to be a setting to specify a location of the block file (so it can be places in a custom writable location). If this value is blank then no delay is required and things just work as they do now, but if the value is set, then a file is written to that location and the handler script will check if this file exists before running the encryption (after mounting the partition initially).

sergioprado commented 3 months ago

I don't see this as a real problem.

Thinking about the development workflow, you start by booting an image in a not-closed device, then the test key is used for encryption. Then you decide to write to the fuses and close the device, you need to be aware that the next boot the OTPMK will be used for encryption, then you remove the previous key blob (/var/local/private/.keys/tdx-enc-key.blob) and a new key is generated and used to encrypt the partition.

In a production (and secure) environment you should, in this especific order 1) flash the image, fuse the keys, close the device and boot the new image. Then you don't have a problem as well.

Does that make sense?

Not sure if @jsrc27 has any insights on this as well.

jsrc27 commented 3 months ago

What @sergioprado has described makes sense to me.

@MMG-DG Is there some kind of use-case or requirement you're trying to fulfill that can't be accomplished with the current workflow? I'm trying to understand since I'm not sure if I fully follow what you're trying to achieve.

MMG-DG commented 3 months ago

I don't see this as a real problem.

Thinking about the development workflow, you start by booting an image in a not-closed device, then the test key is used for encryption. Then you decide to write to the fuses and close the device, you need to be aware that the next boot the OTPMK will be used for encryption, then you remove the previous key blob (/var/local/private/.keys/tdx-enc-key.blob) and a new key is generated and used to encrypt the partition.

In a production (and secure) environment you should, in this especific order 1) flash the image, fuse the keys, close the device and boot the new image. Then you don't have a problem as well.

Does that make sense?

Not sure if @jsrc27 has any insights on this as well.

I partially disagree with this assessment.

In development you would indeed not close the device and would perform any development required. The problem then comes when you have data on the partition (which is encrypted with the "test" key) and on the next boot everything fails because you have now blown the SEC_BOOT fuse. If you delete the key-blob before this "reboot" and fuse blowing, then you are back to the issue of data being overwritten as the original partition (that now cannot be decrypted) will be flattened and encrypted with a new key.

What might be useful is to have the script decrypt the partition back to a raw ext4 one if on shutdown the key.blob doesn't exist (i.e. I have deleted that because the next boot should re-encrypt the partition). But it should also preserve any existing data (if the original PRESERVE_DATA flag is set maybe?).

My use-case would be what I understand our production process to be:

  1. Image burned to SoM
  2. SoM fitted into device (custom carrier board)
  3. Initial Factory Testing is performed
    • This is where at present the partition will be encrypted using the test key
  4. Integration Testing is performed (for other external systems that are involved; usually provided by a simulator box)
  5. Device configurations are then set
    • This is where relevant fuses are blown (including any General Purpose ones used for custom carrier configuration and the SEC_BOOT fuses)
  6. Final Factory Tests are completed
    • This is now where everything goes wrong because we cannot decrypt the original data in the encrypted partition
  7. Final Integration Testing (done before shipping a complete set of devices that are part of a kit)
  8. Production Complete

Another option would be to add a config variable to set the service to be disabled by default? However, as in my original post, maybe an additional file being present could short-circuit the service so it never tries to encrypt the partition while in "Factory" mode but still mounts it to the right location for "insecure" setup processes to be applied.

I hope that all makes sense.

sergioprado commented 3 months ago

@MMG-DG thanks for describing better your use case.

The objective of this layer is to provide a framework so users can easily leverage security features from the SoC, but we don't pretend to provide an all-in-one solution and cover all use cases. That would complicate a lot the code of the layer, and also make it harder to maintain.

Doing the reverse operation (creating an unencrypted partition from an encrypted partition) is probably not something we want to cover here.

But what is nice about Yocto is that you can always (and easily) build on top of another layer.

So for your specific use case, you can create the image with the encryption handler disabled. This can be done by creating an append file in your layer (tdx-enc-handler_%.bbappend) and add the following line:

SYSTEMD_AUTO_ENABLE = "disable"

Then after factory tests are completed and before fusing the keys and closing the device, you can enable the systemd service.

Would that help in your use case?

MMG-DG commented 3 months ago

@sergioprado That solution is pretty much what I have in place at the moment. Just disable the service until factory testing is complete, then enable it before the reboot that will be when the SEC_BOOT fuse gets blown.

Would be good if this expectation and workaround could be added to the documentation. The only thing it does not solve is that now I would need a temporary solution to ensure the partition is mounted by default to the correct location as this will no longer happen while the tdx-enc-handler is disabled.

Question: is it possible to have a second service defined in the bbappend to do the initial partition mounting, then have tdx-enc-handler depend on that service in some way? Not totally clear in my head what I mean at the moment, but I may look to create and extra "layer" that adds this ability and see how it goes. Maybe then run it by you guys here if it is useful.

sergioprado commented 3 months ago

Would be good if this expectation and workaround could be added to the documentation.

Noted.

The only thing it does not solve is that now I would need a temporary solution to ensure the partition is mounted by default to the correct location as this will no longer happen while the tdx-enc-handler is disabled.

That is a good point. You could add a line at /etc/fstab to automount the partition unencrypted, and then remove it atfer the SEC_BOOT fuse gets blown and the encryption service is enabled.

Question: is it possible to have a second service defined in the bbappend to do the initial partition mounting, then have tdx-enc-handler depend on that service in some way?

Yes, it is possible, but I think it would complicate a lot the solution. The solution I recommended in the previous question is simpler and easier to implement.

thepinkmile commented 2 months ago

Hi, FYI: this is MMG_DG, this is one of my home accounts. I have been playing with being able to read the fuse values. Will publish a PR when I get to test it in the week (hopefully).

The basic idea is to add a new variable ‘TDX_ENC_USE_SAFE_CAAM_KEY’ By default this is set to 1, but can be overridden if desired. If this is set, then the SEC_BOOT fuse is checked and the process stops after mounting the raw disk. I have also put the backup function 1 step higher so that the key is not generated.

I then have a second branch, which checks for a ‘${BLOB_FULLPATH}.del’ file in the stop process, which reverses the encryption and deletes the blob. This process only runs when the new variable is 0.

this way it is configurable if you want to encrypt the partition and allows for future process to reset the state to enable blowing the fuse and re-encrypting the data.

Does this sound like a useful addition?

Also, I was wondering if the data partition is enabled, if the path to the blob file should be changed to be on that data partition automagically?

sergioprado commented 2 months ago

It seems like a good addition. I have a few comments:

  1. I would name the variable something like TDX_ENC_CAAM_USE_ONLY_OTPMK_KEY, and by default it should be disabled.
  2. If the variable is 1, then encryption would only be enabled if the device is closed (secure boot is enabled).
  3. The solution to check is the device is closed (i.e. reading the fuses) should be supported upstream.

@jsrc27 @rborn-tx feel free to add your own thoughts.

MMG-DG commented 3 days ago

Happy Halloween,

Just an update on this thread for now. I have my final production v1 builds being delivered over the next few weeks, but I will be looking to re-implement them with the latest commits of this repo (as we are currently pinned to a known working commit due to all the changes that were happening during our release cycle) and looking to create a PR with the changes.

However, I have a few questions....

  1. Would it be preferable to have the fuse state detection code in the tdx-enc script? Or (as I think would probably be better) Should this be a separate service that the tdx-enc-handler would depend on (only in the case of TDX_ENC_CAAM_USE_ONLY_OTPMK_KEY=1?
  2. This alternative service is looking to do fuse detection as early as possible and then create files in /run/???. The filenames would be the FUSE name (as labelled in the documentation), with the value of the fuse in HEX format. This script would have a small config file to enable setting other custom fuse values to be mapped externally to the tdx-enc script. This way the tdx-enc script just needs to depend on the SEC_BOOT.fuse file being present and can check its value easily.

I hope that all makes sense, but would be happy for any input on the idea. Ideally I don't want to be adding too much "cruft" into the tdx-enc script as that is not it's job. I expect if useful, this could be something added to the BSP at a later date, but would just want to get it working first.

Regards, MMG

sergioprado commented 3 days ago

I think it can be integrated into tdx-enc-handler. In case you want to upstream the implementation, it needs to be generic enough to work on all iMX based SoCs. Feel free to send an RFC implementation so we can have a look at it nd provide some feedback.