Open MMG-DG opened 4 months ago
@MMG-DG this is expected.
When the device is not closed, a test key from the CAAM backend is used to encrypt the encryption key used to encrypt/decrypt the partition. When the device is closed, the OTPMK key is used (a never-disclosed 256-bit key randomly generated and fused into each SoC at manufacturing time). This is documented in https://www.kernel.org/doc/html/latest/security/keys/trusted-encrypted.html.
Because you encrypted the key with a test key, that key cannot be decrypted with the OTPMK key after the device is closed. So a production device should always create the encrypted partition from an unencrypted partition at first boot.
I am about to extend the documentation about encryption to add information about the TPM support we are introducing, and I will make sure to state that in the docs.
Thanks for reporting.
I wonder if it might be useful to include another option that allows for this to be delayed?
Based on other documentation, one should apply certificate fuses and then check everything boots without warnings and issues before blowing the final fuse. If this is the case then to do a first-boot encryption is probably not the wanted thing?
Maybe if a delayed-encryption setting is configured then an extra file is deployed into the image to block the encryption? This would instead just mount the raw drive. Once a user is ready and has blown the required fuse they can then delete this file from the rootfs (or other specified writable partition) and then the handler service can perform the encryption at the next boot.
I would guess this just needs to be a setting to specify a location of the block file (so it can be places in a custom writable location). If this value is blank then no delay is required and things just work as they do now, but if the value is set, then a file is written to that location and the handler script will check if this file exists before running the encryption (after mounting the partition initially).
I don't see this as a real problem.
Thinking about the development workflow, you start by booting an image in a not-closed device, then the test key is used for encryption. Then you decide to write to the fuses and close the device, you need to be aware that the next boot the OTPMK will be used for encryption, then you remove the previous key blob (/var/local/private/.keys/tdx-enc-key.blob
) and a new key is generated and used to encrypt the partition.
In a production (and secure) environment you should, in this especific order 1) flash the image, fuse the keys, close the device and boot the new image. Then you don't have a problem as well.
Does that make sense?
Not sure if @jsrc27 has any insights on this as well.
What @sergioprado has described makes sense to me.
@MMG-DG Is there some kind of use-case or requirement you're trying to fulfill that can't be accomplished with the current workflow? I'm trying to understand since I'm not sure if I fully follow what you're trying to achieve.
I don't see this as a real problem.
Thinking about the development workflow, you start by booting an image in a not-closed device, then the test key is used for encryption. Then you decide to write to the fuses and close the device, you need to be aware that the next boot the OTPMK will be used for encryption, then you remove the previous key blob (
/var/local/private/.keys/tdx-enc-key.blob
) and a new key is generated and used to encrypt the partition.In a production (and secure) environment you should, in this especific order 1) flash the image, fuse the keys, close the device and boot the new image. Then you don't have a problem as well.
Does that make sense?
Not sure if @jsrc27 has any insights on this as well.
I partially disagree with this assessment.
In development you would indeed not close the device and would perform any development required. The problem then comes when you have data on the partition (which is encrypted with the "test" key) and on the next boot everything fails because you have now blown the SEC_BOOT fuse. If you delete the key-blob before this "reboot" and fuse blowing, then you are back to the issue of data being overwritten as the original partition (that now cannot be decrypted) will be flattened and encrypted with a new key.
What might be useful is to have the script decrypt the partition back to a raw ext4 one if on shutdown the key.blob doesn't exist (i.e. I have deleted that because the next boot should re-encrypt the partition). But it should also preserve any existing data (if the original PRESERVE_DATA flag is set maybe?).
My use-case would be what I understand our production process to be:
Another option would be to add a config variable to set the service to be disabled by default? However, as in my original post, maybe an additional file being present could short-circuit the service so it never tries to encrypt the partition while in "Factory" mode but still mounts it to the right location for "insecure" setup processes to be applied.
I hope that all makes sense.
@MMG-DG thanks for describing better your use case.
The objective of this layer is to provide a framework so users can easily leverage security features from the SoC, but we don't pretend to provide an all-in-one solution and cover all use cases. That would complicate a lot the code of the layer, and also make it harder to maintain.
Doing the reverse operation (creating an unencrypted partition from an encrypted partition) is probably not something we want to cover here.
But what is nice about Yocto is that you can always (and easily) build on top of another layer.
So for your specific use case, you can create the image with the encryption handler disabled. This can be done by creating an append file in your layer (tdx-enc-handler_%.bbappend
) and add the following line:
SYSTEMD_AUTO_ENABLE = "disable"
Then after factory tests are completed and before fusing the keys and closing the device, you can enable the systemd service.
Would that help in your use case?
@sergioprado That solution is pretty much what I have in place at the moment. Just disable the service until factory testing is complete, then enable it before the reboot that will be when the SEC_BOOT
fuse gets blown.
Would be good if this expectation and workaround could be added to the documentation. The only thing it does not solve is that now I would need a temporary solution to ensure the partition is mounted by default to the correct location as this will no longer happen while the tdx-enc-handler is disabled.
Question: is it possible to have a second service defined in the bbappend to do the initial partition mounting, then have tdx-enc-handler depend on that service in some way? Not totally clear in my head what I mean at the moment, but I may look to create and extra "layer" that adds this ability and see how it goes. Maybe then run it by you guys here if it is useful.
Would be good if this expectation and workaround could be added to the documentation.
Noted.
The only thing it does not solve is that now I would need a temporary solution to ensure the partition is mounted by default to the correct location as this will no longer happen while the tdx-enc-handler is disabled.
That is a good point. You could add a line at /etc/fstab to automount the partition unencrypted, and then remove it atfer the SEC_BOOT fuse gets blown and the encryption service is enabled.
Question: is it possible to have a second service defined in the bbappend to do the initial partition mounting, then have tdx-enc-handler depend on that service in some way?
Yes, it is possible, but I think it would complicate a lot the solution. The solution I recommended in the previous question is simpler and easier to implement.
Hi, FYI: this is MMG_DG, this is one of my home accounts. I have been playing with being able to read the fuse values. Will publish a PR when I get to test it in the week (hopefully).
The basic idea is to add a new variable ‘TDX_ENC_USE_SAFE_CAAM_KEY’ By default this is set to 1, but can be overridden if desired. If this is set, then the SEC_BOOT fuse is checked and the process stops after mounting the raw disk. I have also put the backup function 1 step higher so that the key is not generated.
I then have a second branch, which checks for a ‘${BLOB_FULLPATH}.del’ file in the stop process, which reverses the encryption and deletes the blob. This process only runs when the new variable is 0.
this way it is configurable if you want to encrypt the partition and allows for future process to reset the state to enable blowing the fuse and re-encrypting the data.
Does this sound like a useful addition?
Also, I was wondering if the data partition is enabled, if the path to the blob file should be changed to be on that data partition automagically?
It seems like a good addition. I have a few comments:
TDX_ENC_CAAM_USE_ONLY_OTPMK_KEY
, and by default it should be disabled.1
, then encryption would only be enabled if the device is closed (secure boot is enabled).@jsrc27 @rborn-tx feel free to add your own thoughts.
Happy Halloween,
Just an update on this thread for now. I have my final production v1 builds being delivered over the next few weeks, but I will be looking to re-implement them with the latest commits of this repo (as we are currently pinned to a known working commit due to all the changes that were happening during our release cycle) and looking to create a PR with the changes.
However, I have a few questions....
TDX_ENC_CAAM_USE_ONLY_OTPMK_KEY=1
?/run/???
. The filenames would be the FUSE name (as labelled in the documentation), with the value of the fuse in HEX format. This script would have a small config file to enable setting other custom fuse values to be mapped externally to the tdx-enc script. This way the tdx-enc script just needs to depend on the SEC_BOOT.fuse
file being present and can check its value easily.I hope that all makes sense, but would be happy for any input on the idea. Ideally I don't want to be adding too much "cruft" into the tdx-enc script as that is not it's job. I expect if useful, this could be something added to the BSP at a later date, but would just want to get it working first.
Regards, MMG
I think it can be integrated into tdx-enc-handler
. In case you want to upstream the implementation, it needs to be generic enough to work on all iMX based SoCs. Feel free to send an RFC implementation so we can have a look at it nd provide some feedback.
I am using an image based off the tdx-reference-minimal-image. I am using a pinned revision of this repo at
c3123e14a2a01106b3c46a836211533efb380e17
. I am settingTDX_ENC_PRESERVE_DATA = "1"
in my local.conf. Once the image is flashed onto a SoM (iMX8MP v1.1A or v1.1B) everything looks fine. I blow the SRK HASH fuses and reboot.... still working fine I blow the following fuses to set boot config and disable some debug ports:Reboot... everything is still good. Finally I blow the SEC_BOOT fuse:
Reboot... everything starts up, but my encryption key is now invalid:
I have managed to restore the encrypted partition, but is this something that should be considered / noted before blowing that fuse? Maybe there should be a decrypt script to enable reversing the process before blowing that fuse such that the next boot will have an unencrypted partition to re-encrypt correctly?