gramineproject / gramine

A library OS for Linux multi-process applications, with Intel SGX support
GNU Lesser General Public License v3.0
607 stars 201 forks source link

RFC: Support for filesystem migration after SGX SVN update #855

Open DL8 opened 2 years ago

DL8 commented 2 years ago

Description of the problem

SGX key derivation gets SVN fields in the request (ISVSVN, CPUSVN and CONFIGSVN). This mechanism provides backwards compatibility and anti-rollback protection as follows:

  1. Each combination of SVNs generates a different key
  2. If all SVNs are less or equal the values of the currently running enclave, a key is returned
  3. If at least one SVN value is greater than the currently running enclave, key derivation fails

Current implementation of sgx_get_seal_key() uses the current enclave's SVNs (ISV, CPU and CONFIG) in a hardcoded manner. Consequently, Gramine does not provide backwards compatibility mechanism: if the enclave or the platform were updated, previously sealed blob will no longer be accessible.

This is problematic not only for user code, but could also break backwards compatibility of protected filesystem: if a file was written with one combinations of SVNs, it will no longer be accessible after they were changed (scenario and expected vs. actual results are below).

I think the following changes will be required:

  1. Protected file format: the file's header will have to include either the entire key request or selected parameters (SVNs are required, but other fields may be needed as well)
  2. Adding some more parameters to sgx_get_seal_key() may be needed (either all or some of the key request parameters). PalGetSpecialKey() may have to be modified as well, but I'm not sure if and how
  3. What should happen if data is written into a file that was protected with older SVNs? Writing with the old SVNs is a security concern, because in that case potentially vulnerable enclave/platform may disclose the new data. Re-encrypting the entire file may be a significant performance impact, but it happens only once per SVN upgrade

Steps to reproduce

Setup:

  1. Create two enclaves: enclave A with ISVSVN=1 and enclave B with ISVSVN=2, such that both enclave access the same protected files and both are signed with the same key
  2. In both enclaves, mount the protected filesystem with MRSIGNER-based key
  3. Write file X with enclave A
  4. Write file Y with enclave B

Test flow:

  1. Enclave A: read file X
  2. Enclave A: read file Y
  3. Enclave B: Read file X
  4. Enclave B: Read file Y

Expected results

  1. Success
  2. Fail
  3. Success
  4. Success

Actual results

  1. Success
  2. Fail
  3. Fail
  4. Success
dimakuv commented 2 years ago

For context, the sgx_get_seal_key() function looks like this currently: https://github.com/gramineproject/gramine/blob/90ea5a52609a872c992135f9311a47aeacd5f2b1/pal/src/host/linux-sgx/enclave_framework.c#L241-L270

Note the memcpy() operations -- this is what @DL8 means by "hardcoded manner". In other words, the seal key will be generated based on the current-enclave's CPU SVN, ISV SVN, CONFIG SVN. (This also means that the EGETKEY instruction currently never fails due to scenario 3 described by @DL8).

I currently have no good ideas what to do about it.

DL8 commented 2 years ago

As I see it, sgx_getseal_key() signature should be changed to get the entire key request or just relevant parameters. It may be reasonable to copy it locally to enforce alignment and to perform some sanity checks prior to actual EGETKEY invocation (my hunch is that only inputs that lead to an exception and obvious errors should be ruled out). This is the easier part. The harder part is:

  1. Should PalGetSpecialKey() signature and semantics be modified? And if yes, how should it be done?
  2. What about the pseudo-file interface? How should it be updated to support the required flexibility?
  3. What's the impact on cross-TEE support? Does it matter here?
dimakuv commented 2 years ago

Yeah, exactly. Changing sgx_get_seal_key() signature is a no-brainer. The real question is: how do it expose it to end users (in the manifest, in the format of Protected/Encrypted Files, in the pseudo-file interface /dev/attestation/...). And this is hard to design right, and requires a lot of discussions...

mkow commented 2 years ago

And this is hard to design right, and requires a lot of discussions...

+1. This seems pretty hard to get right and easy to accidentally worsen security of the whole Gramine. We currently have a lot of pending design discussion and this one is only a quality-of-life improvement (not a security fix, and not really a new functionality), so IMO it should have low priority for now. Currently, if user's enclave gets compromised somehow then they need to redeploy them in the new version without data migration, or do the migration explicitly by themselves (pull the state of the old enclave to the trusted server, deploy the new one, push the old state).

Also, this feature isn't as straightforward as it seems and is IMO a bit controversial. If a specific version of an enclave was vulnerable, then using its state (i.e. its filesystem, in our case) in the patched version propagates the potentially attacker-controller state to the new one. What users should do is to redeploy the system from a clean state or from a trusted-server-verified state.

DL8 commented 2 years ago

Currently, if user's enclave gets compromised somehow then they need to redeploy them in the new version without data migration, or do the migration explicitly by themselves (pull the state of the old enclave to the trusted server, deploy the new one, push the old state).

How do you expect migration to work? If I am not mistaken, sealing keys depend on platform-specific secrets, so copying the protected filesystem data won't help you so much. Do you mean to let the developer define some logic that will be invoked in case a file with older SVN was opened? I think this is too complicated, because there may be too many possible cases and it might become too complicated very fast.

Also, this feature isn't as straightforward as it seems and is IMO a bit controversial. If a specific version of an enclave was vulnerable, then using its state (i.e. its filesystem, in our case) in the patched version propagates the potentially attacker-controller state to the new one.

This is a problem, but I would expect the application to validate all inputs from filesystem (regardless of SGX). I don't think protected files are an exception to this rule.

What users should do is to redeploy the system from a clean state or from a trusted-server-verified state.

Starting from scratch with the updated enclave sounds like the most secure option, but I think it's a very bad idea to tell developers they lose access to all their data when an SVN is incremented: it may lead to developers not incrementing their ISVSVNs even if vulnerabilities were fixed, which puts data created by the fixed enclave at risk. And to make things worse, note that CPU SVNs also affect the derived keys: the same enclave may be launched on the same platform, but it might lose access to its protected filesystem because of a platform update.

In conclusion, it may be a controversial feature, but I think it might be just as controversial not to support loading keys with older SVNs, because access to protected file may be lost also in case of platform updates (which may be out of the enclave's control). I don't know if allowing explicit support for migration is the right way, because it may significantly complicate the developer's life. Therefore, I believe the most pragmatic approach is to support reading files with older SVNs, but to write data using the current enclave's SVNs (this prevents platform/enclave with older SVNs from reading the new data).

lejunzhu commented 2 years ago

Yeah, exactly. Changing sgx_get_seal_key() signature is a no-brainer. The real question is: how do it expose it to end users (in the manifest, in the format of Protected/Encrypted Files, in the pseudo-file interface /dev/attestation/...). And this is hard to design right, and requires a lot of discussions...

I think we should, by default, support opening existing files, when writing we should always use the latest SVN. Especially for CPU SVN, as there is probably no way to go back. If there is no such mechanism, the user data stored in encrypted files may be lost forever after CPU SVN changes.

Also in SGX SDK, the protected file format does record the cpu_svn, isv_svn and key_id in the plain part, and use them upon opening. So unless we explicitly document the opposite, users familiar with SGX may be surprised when they can't open existing files.

DL8 commented 2 years ago

@mkow - I think we have mixed two topics here:

  1. More flexible key derivation APIs
  2. Protected file system backwards compatibility/migration support

I think it will be better to create a separate issue for key derivation APIs

dimakuv commented 2 years ago

Also in SGX SDK, the protected file format does record the cpu_svn, isv_svn and key_id in the plain part, and use them upon opening.

This is new to me. @lejunzhu Was it added rather recently (less than 2 years ago)? Because I don't see anything like this in the current Gramine format, and it was taken directly from SGX SDK:

https://github.com/gramineproject/gramine/blob/f7995b60c7b49e075adefa18573767c6e24a4cc9/common/src/protected_files/protected_files_format.h#L31-L37

DL8 commented 2 years ago

I created a separate issue for possible key derivation API changes: #857

lejunzhu commented 2 years ago

This is new to me. @lejunzhu Was it added rather recently (less than 2 years ago)? Because I don't see anything like this in the current Gramine format, and it was taken directly from SGX SDK:

https://github.com/gramineproject/gramine/blob/f7995b60c7b49e075adefa18573767c6e24a4cc9/common/src/protected_files/protected_files_format.h#L31-L37

I only read the code today and am not sure about the history. It looks like this now:

https://github.com/intel/linux-sgx/blob/70e1535caa96c315189760a3ad4663eded9e8a2c/sdk/protected_fs/sgx_tprotected_fs/protected_fs_nodes.h#L56-L71

And the usage of stored SVN:

https://github.com/intel/linux-sgx/blob/70e1535caa96c315189760a3ad4663eded9e8a2c/sdk/protected_fs/sgx_tprotected_fs/file_crypto.cpp#L273-L275

dimakuv commented 2 years ago

Thanks @lejunzhu. This was introduced very long ago, 5 years ago. So we at Gramine just decided to remove these fields? Hm... I don't remember why these fields were removed.

This should be changed actually. @DL8 Is that what you expected to see in Gramine's Protected/Encrypted Files?

DL8 commented 2 years ago

Yes, and new key generation should look something like this: https://github.com/intel/linux-sgx/blob/70e1535caa96c315189760a3ad4663eded9e8a2c/sdk/protected_fs/sgx_tprotected_fs/file_crypto.cpp#L197

Note that:

  1. Current SVNs are taken
  2. A random key ID is generated

In addition to that, I think CONFIGSVN field should be with the same semantics as other SVNs (I wonder why I don't see it in SGX SDK). It will ease implementing KSS support in the future (perhaps the details should be discussed in #800 ).

dimakuv commented 1 year ago

Reviving this 6-months-old thread. We hit this problem again, and need to solve it sooner than later.

Let me summarize the problem:

The last two bullet items constitute the problem: there are only two "derive-from" sealing keys available in Gramine for any Encrypted File, and each of these sealing keys is tied to the current CPU, ISV and CONFIG numbers (as reported by EREPORT instruction).

This problem manifests in the following scenario:

  1. App in Gramine starts on an SGX platform with CPU SVN = 1.
  2. App does some work and then seals its data to myapp.encrypted.data file, encrypted using Encrypted Files feature of Gramine, with e.g. _sgx_mrenclave key.
  3. App terminates.
  4. SGX platform turns out to be vulnerable to some side-channel attack. The admin upgrades the SGX platform to new firmware, and the TCB version is bumped. This leads to CPU SVN = 2.
  5. App in Gramine starts again, now on an SGX platform with CPU SVN = 2.
  6. App tries to unseal the data using the _sgx_mrenclave key. However, CPU SVN is now different, thus the MRENCLAVE-based sealing key created on startup is different from the one in step 2. Loading the file is not possible.
  7. Fail.

SGX SDK solves this problem in a sub-optimal way, trading usability for security (see @mkow and @DL8 discussion on loading data from a previously-vulnerable enclave version):

This allows SGX SDK to solve the problematic scenario above: each Encrypted File simple saves all "version metadata" in its header, and SGX SDK reads this metadata first, generates the sealing key based on this metadata, and now the key can decrypt the file.

Gramine can't do this easily, because Gramine doesn't support such generation of sealing keys on a per-file basis.


I currently don't have a good proposal to allow old-version Encrypted Files in a new-version SGX enclave. It seems that the following properties are desired at a minimum:

  1. The old-version file may be downgraded to read-only (even if requested to be opened for RW). This could probably simplify changes in Gramine.
  2. Out of ISV SVN, CPU SVN and CONFIG SVN, at least CPU SVN should be allowed to be old -- because users of Gramine enclaves might have no power over CPU updates.
  3. Gramine should be modified to either:
    • create SGX sealing keys on the fly, getting the SVNs from the file metadata (similar to SGX SDK),
    • or expose several versions of sealing keys and e.g. try them one by one on a to-be-opened file...
lejunzhu commented 1 year ago

But if the application is not specifically modified for Gramine, it may refuse to run if the file is r/o. Then the data will be lost if it is encrypted with MRENCLAVE, as the application cannot be modified when the user notice the upgrade.

dimakuv commented 1 year ago

But if the application is not specifically modified for Gramine, it may refuse to run if the file is r/o. Then the data will be lost if it is encrypted with MRENCLAVE, as the application cannot be modified when the user notice the upgrade.

That is true. The alternative will be to allow writes as well, and print a warning (or have a manifest option like sgx.allow_old_files which is false by default).

lejunzhu commented 1 year ago

That is true. The alternative will be to allow writes as well, and print a warning (or have a manifest option like sgx.allow_old_files which is false by default).

But changing the manifest will change MRENCLAVE as well...

mkow commented 1 year ago

The alternative will be to allow writes as well, and print a warning (or have a manifest option like sgx.allow_old_files which is false by default).

Warnings here are pointless IMO, if they appear then it's already too late to react to them, because in this scenario they appear only after running on the updated host and the only thing they'll do is hinting the untrusted host owner that the enclave is using old, potentially compromised files.

DL8 commented 1 year ago

Then the data will be lost if it is encrypted with MRENCLAVE, as the application cannot be modified when the user notice the upgrade.

MRENCLAVE is subject to change per build regardless of this issue. In general, if you need data to persist across different versions, you should use MRSIGNER-based keys

I currently don't have a good proposal to allow old-version Encrypted Files in a new-version SGX enclave. It seems that the following properties are desired at a minimum:

  1. The old-version file may be downgraded to read-only (even if requested to be opened for RW). This could probably simplify changes in Gramine.

The easiest way I can see to support read-write files is to use the SVNs from the file to protect its data, but then new data has no rollback protection, which is insecure. Therefore, I think there is a missing requirement that writes always use the current SVNs.

If I understand correctly, the motivation behind this requirement is to prevent a scenario where chunks with different SVNs need to written into the same file. If that's indeed the case:

  1. Appending into an existing file ("a" mode) is also problematic
  2. I see no problem with truncation ("w" mode)
  3. How are problematic writes going to be prevented?
  1. Out of ISV SVN, CPU SVN and CONFIG SVN, at least CPU SVN should be allowed to be old -- because users of Gramine enclaves might have no power over CPU updates.

Not allowing old user-controlled SVNs means users lose access to old files when ISV SVN is updated. This may lead developers not to increment ISV SVN when needed, which practically disables SGX's anti-rollback mechanism

  1. Gramine should be modified to either:

    • create SGX sealing keys on the fly, getting the SVNs from the file metadata (similar to SGX SDK),
    • or expose several versions of sealing keys and e.g. try them one by one on a to-be-opened file...

Note that CPU SVN is a vector with several SVNs, so it may be quite cumbersome to handle (even without enclave SVNs). Unfortunately, even if SVNs are added to file metadata, such a guessing mechanism may be necessary to maintain compatibility with current protected filesystem format

lejunzhu commented 1 year ago

MRENCLAVE is subject to change per build regardless of this issue. In general, if you need data to persist across different versions, you should use MRSIGNER-based keys

I'm thinking of the situation that the application user doesn't intend to change the software, but the underlying hardware (BIOS) upgrade makes the encrypted file unaccessible. Therefore it is important for the user to get the same access to the file, so that the availability of the software is not affected. That gives user time to evaluate the hardware upgrade and decide what to do next.

Of course, sometimes the file integrity is so important that the user would rather lose the data than open a possibly compromised file, therefore we probably need an option in the manifest for the user to choose between these behaviors.

dimakuv commented 1 year ago

Looking at this thread again, my current idea is this:

With this construction, Gramine will start the life of the application with some meaningful CPUSVN and ISV SVN numbers. After reboots, Gramine will pick up this metadata file and continue using these CPUSVN and ISV SVN numbers, thus solving our problem of migration/upgrade.

Why don't we hard-code e.g. CPUSVN in the manifest file? This is because CPUSVN is an opaque 128-bit byte sequence; there is no spec on CPUSVN format. So we need to bootstrap CPUSVN from the processor itself, e.g. on the first graminized-app run.

Why don't we introduce CPUSVN per-file, as Intel SGX SDK does? Because I believe this flexibility comes at a huge price of code complexity and worse usability. E.g., we currently have a single /dev/attestation/keys/_sgx_mrenclave pseudo-file, but this file will become meaningless if we have per-file keys... Keeping a single CPUSVN per Gramine app seems like a reasonable compromise.

Why do we need HMAC? Because we want integrity protection of the metadata file, otherwise the attacker could simply replace CPUSVN with all-zeros, and then the file could be read by very old platform version.

Why don't we encrypt the file? Because this is not necessary -- none of the fields need to be secret. We only need a guarantee that the read file was indeed produced by the application enclave.

What is this specially-for-this-purpose MRENCLAVE key? It's a sealing key generated similarly to how we currently do it in Gramine, but with CPUSVN, ISVSVN, CONFIGID set to all-zeros, so that this file can be always read-and-verified on any version of the platform and software. This MRENCLAVE key will also have KEYID == nonce, for wear-out protection.

We probably need to add a reserved field of e.g. 128B in size, for future expansions.

This whole new logic should be hidden behind a new manifest option, for backwards compatibility. Maybe sgx.seal_key.versioning = ["strict"|"memorize_versions"], where "strict" is what we currently have (problematic, doesn't allow upgrades) and "memorize_versions" is my proposed scheme.

With this scheme, the attacker can (1) delete the metadata file at will, and (2) replace the newer version of the metadata file with the older version. The delete action will lead to Gramine not being able to decrypt the old encrypted files, because e.g. the CPUSVN version will be newer than that of the files. The replace action may lead to Gramine decrypting some really old files, but at least it will be limited by the very-first metadata file contents.

1 Maybe I should use the MRSIGNER key instead?

dimakuv commented 1 year ago

@DL8 @lejunzhu @kailun-qin Does the above implementation make sense to you?

lejunzhu commented 1 year ago

A single, oldest CPUSVN is probably not desirable. When there is a security update, the old keys are potentially compromised. If we keep using the old keys to encrypt new data after upgrade, it will allow this kind of attack: 1. upgrade the platform, 2. run the application and get some new data with remote attestation, the verification result will be up-to-date. 3. make the application write the data to encrypted file, using the old key. 4. rollback the upgrade somehow and use the known exploit to retrieve the new data from the same app running on the CPU with old microcode. I think we should always use the latest CPUSVN everywhere in Gramine, except during fopen(). For example, if the application allows migration on an encrypted file "history.dat". we can introduce a new attribute in fs.mount: {path="history.dat", uri="...". type="encrypted", key_name="_sgx_mrenclave", allow_migration=true} Also, we store per-file CPUSVN either in the same file or in a separate metadata file. When CPUSVN changes, upon fopen() the file will fail to decrypt, then we try the old CPUSVN.

if (current logic of file open fails && file.allow_migration) {
    old_cpusvn = metadata.cpusvn;
    if (key_name == "_sgx_mrenclave") {
        old_key = egetkey(mrenclave, old_cpusvn, ...);
        if (decrypt using old_key fails) {
            /* existing error handling code */
        } else {
            /* migrate the file to new CPUSVN */
            re-encrypt root key with current CPUSVN
            update block 0 with the new root key
            update the metadata with current CPUSVN
        }
    } else if (key_name == "_sgx_mrsigner") { /* do the same */ }
}
kailun-qin commented 1 year ago

I think @lejunzhu raised a good point (that we need the latest CPUSVN and some kind of migration process for the potential downgrade attacks) but it seems that the per-file record has to be introduced then (but as argued by @dimakuv, it comes at a huge price of code complexity and worse usability).

Otherwise, as some random thoughts, will an offline migration tool (or a migration mode of Gramine, that only conduct the file migration for all encrypted files from a certain older CPUSVN to a newer one) help?

dimakuv commented 1 year ago

Ok, looks like there is strong opposition to allowing older CPUSVNs. Then I will not pursue that approach.

I think there is nothing else we can do but go with @lejunzhu's proposal:

  1. Keep the CPUSVN with which the file was encrypted in per-file metadata.
  2. If the file is MRENCLAVE- or MRSIGNER-based (i.e. using SGX sealing key, not user-supplied key), and we notice a mismatch between current CPUSVN and the file CPUSVN, then we trigger migration: a. Call EGETKEY to get a file-CPUSVN sealing key. b. Decrypt file contents with this old key into memory. c. Overwrite this file with newly encrypted (with a new key) file contents. d. Update file metadata with a new CPUSVN. e. Throw away the old key.

There are two differences from what @lejunzhu proposed:

As @lejunzhu mentioned, this new migration logic can be gated by a new manifest sub-option: {path="...", uri="...", type="encrypted", key_name="_sgx_mrenclave", allow_migration=true}.

This has the benefit that Gramine still maintains in itself only a single sealing key (based on the latest CPUSVN), and generates "old" sealing keys only on demand when migration is required.

I also believe that Gramine codebase has all the building blocks to implement such a feature. Should be doable with not-too-drastic modifications to Gramine.

lejunzhu commented 1 year ago
  • Migration of the file to the new key is done in a dumb way -- simply re-encrypt the whole file contents by first decrypting with the old key, and then encrypting with the new key. No need to do the low-level Protected Files logic of "update block 0 ...".

Re-encrypting the whole file is slower if the file is big. I think we only need to re-encrypt the encrypted metadata, which contains the mht_key. That is much faster. If the old subkeys are (potentially) compromised, an attacker wanting to snoop or modify data can always do so using the old file. So there is no advantage(?) to re-encrypt the same file with a set of new keys.

Any new data written to the file after CPUSVN upgrade will re-create the chain of subkeys until the mht_key, and the attacker can't touch the new data because at this point we will use the new CPUSVN.

mkow commented 1 year ago

Otherwise, as some random thoughts, will an offline migration tool (or a migration mode of Gramine, that only conduct the file migration for all encrypted files from a certain older CPUSVN to a newer one) help?

I like this idea, as it doesn't induce any complexity into Gramine until there's a need for migration. One potential footgun here is that people may just fire this tool on an untrusted machine passing the encryption key into it. So, we need to think carefully how we name and describe everything.

Ok, looks like there is strong opposition to allowing older CPUSVNs.

Same from my side, the whole idea here is to migrate out from insecure TCB and thus we need to prevent downgrade attacks.

I think there is nothing else we can do but go with @lejunzhu's proposal: [...]

Looks good from my side, although I only did a quick read. One important problem here is that it doesn't give the user any practical way to upgrade the whole TCB, the lazy way will be the only possibility. Which will leave some files malleable by the attacker (because the old keys may be compromised and Gramine will trust the old file contents when re-encrypting it for migration, which may happen long after SGX/Gramine was updated, on the first use of that file by the app).

Or actually... Isn't this completely insecure? Two cases:

I think we only need to re-encrypt the encrypted metadata, which contains the mht_key. That is much faster.

But mht_key is also potentially compromised at this point.

Any new data written to the file after CPUSVN upgrade will re-create the chain of subkeys until the mht_key, and the attacker can't touch the new data because at this point we will use the new CPUSVN.

What do you mean by subkeys here? I don't remember the PF format exactly now, but I think there was no support for multiple keys for different parts of the file.

lejunzhu commented 1 year ago

Looks good from my side, although I only did a quick read. One important problem here is that it doesn't give the user any practical way to upgrade the whole TCB, the lazy way will be the only possibility. Which will leave some files malleable by the attacker (because the old keys may be compromised and Gramine will trust the old file contents when re-encrypting it for migration, which may happen long after SGX/Gramine was updated, on the first use of that file by the app).

For files encrypted using mrenclave, there's probably nothing we can do. If the files are encrypted by mrsigner, the user can either deploy a migration tool as suggested by @kailun-qin or do the following: 1. temporarily add allow_migration to the manifest, which changes the value of mrenclave and invalidates remote attestation 2. run the app once and lazy migrate the files by accessing them, 3. delete the temporary binary and continue use the old manifest (without allow_migration), and its mrenclave is the valid remote attestation value.

Or actually... Isn't this completely insecure? Two cases:

  • Sealed files: Let's say the attacker managed to leak some old sealing key. They can now plant any file by encrypting it properly and just specifying the old SVN inside the header. Gramine will happily decrypt and use the forged file contents (after migrating it to the new key). It doesn't allow leaking any new data though, but modifying app files is usually enough to get RCE inside it (e.g. by modifying configs).
  • Encrypted with remotely-deployed key: I think in this proposal this key is never upgraded, so if the attacker leaks it once then it can still be used after TCB upgrade.

This is a problem only if it's a zero-day attack. Once the new TCB appears in the remote attestation service, the attacker can't retrieve the key from remote server, unless the server allows out-of-date proof. This is different from CPU generated keys, which can always be generated. Also, we can make the migration tool support some kind of key rotation operation, so existing files can switch to a new encryption key. But that's more of a generic security practice than specific solution for this issue.

I think we only need to re-encrypt the encrypted metadata, which contains the mht_key. That is much faster.

But mht_key is also potentially compromised at this point.

Yes. Keep using this key is equivalent to the "sealed files" type of risk you mentioned above. This is inevitable and why the application developer will have to decide whether to allow migration for a certain file. We only keep using the old mht_key as long as the file is unmodified. Once it's modified, this key will change.

What do you mean by subkeys here? I don't remember the PF format exactly now, but I think there was no support for multiple keys for different parts of the file.

I mean the key used to encrypt data blocks, and the key used to encrypt the mht block which contains the data block key, and so on, until the mht_key. When we write even one byte to the file, existing code in Gramine will re-generate this chain of keys (including mht_key). This should prevent the attacker from accessing the new data block, even if he knows everything of the old file.

dimakuv commented 1 year ago

Otherwise, as some random thoughts, will an offline migration tool (or a migration mode of Gramine, that only conduct the file migration for all encrypted files from a certain older CPUSVN to a newer one) help?

I like this idea, as it doesn't induce any complexity into Gramine until there's a need for migration. One potential footgun here is that people may just fire this tool on an untrusted machine passing the encryption key into it. So, we need to think carefully how we name and describe everything.

Wait, I think you're forgetting some properties of SGX sealing keys:

  1. SGX sealing keys can only be retrieved inside the SGX enclave. Also, SGX sealing keys are bound to the particular platform. Thus, it is not possible to retrieve the same sealing keys on two different platforms. Also, SGX sealing keys cannot be retrieved on non-SGX platforms at all.
  2. MRENCLAVE-based sealing key is bound to the exact identity of the SGX enclave. Thus, it is not possible to retrieve the same MRENCLAVE sealing key in any other SGX enclave other than the original SGX enclave. In other words, if Gramine app uses MRENCLAVE-based sealing key, then no other tool/enclave could retrieve the same key. Thus, an "offline migration tool" -- even if we run it inside the SGX enclave on top of Gramine -- doesn't make any sense in this case.
  3. MRSIGNER-based sealing key is bound to the identity of the SGX signer. This more relaxed policy allows the signer person to use the "offline migration tool" -- if this tool is put into the SGX enclave and is signed by the signer person (with the same signing key as the Gramine app).

So, as you can see, MRSIGNER-based keys are malleable to this "offline migration tool" approach, though with some quirks (needs to run inside the SGX enclave, needs to be signed with the same signing key). And MRENCLAVE-based keys simply too restrictive to use the "offline migration tool" approach.

To support both MRSIGNER- and MRENCLAVE-based keys, the only common denominator is to introduce the "migration" code path in Gramine. I'm strongly against other ideas which would either break MRENCLAVE-based keys migration completely, or would result in two different ways of handling MRSIGNER- and MRENCLAVE-based keys.

[...] Or actually... Isn't this completely insecure? Two cases:

  • Sealed files: Let's say the attacker managed to leak some old sealing key. They can now plant any file by encrypting it properly and just specifying the old SVN inside the header. Gramine will happily decrypt and use the forged file contents (after migrating it to the new key). It doesn't allow leaking any new data though, but modifying app files is usually enough to get RCE inside it (e.g. by modifying configs).

First of all, that's why we'll introduce the new manifest sub-option allow_migration = true, optional for each encrypted FS mount. This will give some flexibility for the manifest developer.

Second of all, if the old sealing key is leaked, then can we do any better than the proposed scheme? To me, the only way to protect from this attack is to disallow completely the migration...

By the way, what do you mean by "leaking the old sealing key"? You mean the derived (final) sealing key, not the secret in the SGX CPU fuses? Because in the latter case, the whole CPU is totally vulnerable iiuc. But if you mean the final sealing key, then it looks like the app developer will need to assess the risk and re-sign and re-deploy the application (in this case, a definitely different sealing key will be derived, even if the CPUSVN is specified as an old one).

  • Encrypted with remotely-deployed key: I think in this proposal this key is never upgraded, so if the attacker leaks it once then it can still be used after TCB upgrade.

See @lejunzhu reply. If the remotely-deployed key was leaked, then the key-releasing remote party must mark this key as unusable, and never release this key again. Also, this seems to be a completely different problem from what we're discussing in this issue.

mkow commented 1 year ago
  1. run the app once and lazy migrate the files by accessing them

But the problem here is that only files accessed in that particular run will be upgraded, potentially missing some.

  1. delete the temporary binary and continue use the old manifest (without allow_migration), and its mrenclave is the valid remote attestation value.

The attacker might have backed up the files. It's an untrusted host, after all.

This is a problem only if it's a zero-day attack. Once the new TCB appears in the remote attestation service, the attacker can't retrieve the key from remote server, unless the server allows out-of-date proof. This is different from CPU generated keys, which can always be generated.

There's always a window between the attack being public and everyone applying patches. Also, with the current super-long embargoes for side-channels, bugs can easily leak, like it was with Spectre and Meltdown.

Also, we can make the migration tool support some kind of key rotation operation, so existing files can switch to a new encryption key. But that's more of a generic security practice than specific solution for this issue.

That's what we IMO should do, a tool like Kailun proposed which changes encryption key + recommendation for users how to use it. But see below, I missed that it has to be a part of Gramine itself.

Wait, I think you're forgetting some properties of SGX sealing keys:

1. SGX sealing keys can only be retrieved inside the SGX enclave. Also, SGX sealing keys are bound to the particular platform. Thus, it is not possible to retrieve **the same sealing keys** on two different platforms. Also, SGX sealing keys cannot be retrieved on non-SGX platforms at all.

2. MRENCLAVE-based sealing key is bound to the exact identity of the SGX enclave. Thus, it is not possible to retrieve **the same MRENCLAVE sealing key** in any other SGX enclave other than the original SGX enclave. In other words, if Gramine app uses MRENCLAVE-based sealing key, then no other tool/enclave could retrieve the same key. Thus, an "offline migration tool" -- even if we run it inside the SGX enclave on top of Gramine -- doesn't make any sense in this case.

3. MRSIGNER-based sealing key is bound to the identity of the SGX signer. This more relaxed policy allows the signer person to use the "offline migration tool" -- if this tool is put into the SGX enclave and is signed by the signer person (with the same signing key as the Gramine app).

So, as you can see, MRSIGNER-based keys are malleable to this "offline migration tool" approach, though with some quirks (needs to run inside the SGX enclave, needs to be signed with the same signing key). And MRENCLAVE-based keys simply too restrictive to use the "offline migration tool" approach.

To support both MRSIGNER- and MRENCLAVE-based keys, the only common denominator is to introduce the "migration" code path in Gramine. I'm strongly against other ideas which would either break MRENCLAVE-based keys migration completely, or would result in two different ways of handling MRSIGNER- and MRENCLAVE-based keys.

@dimakuv: I meant this particular point only for RA-deployed keys, sorry, I wasn't clear here, there are actually 3 different cases we need to consider (mrsigned-sealed, mrenclave-sealed, ra-deployed). But yeah, I missed the problem with MRENCLAVE-sealed migration. The migration logic has to be part of Gramine then... But how to force the upgrade of all the files? Maybe a special startup flag? And then after migration you upgrade the enclave to a version which would refuse to use the old SVN keys.

Second of all, if the old sealing key is leaked, then can we do any better than the proposed scheme? To me, the only way to protect from this attack is to disallow completely the migration...

Yes, the enclave author can migrate all the files eagerly and then ship an enclave which refuses to use the old-SVN-based keys.

By the way, what do you mean by "leaking the old sealing key"? You mean the derived (final) sealing key, not the secret in the SGX CPU fuses?

Yes.

See @lejunzhu reply. If the remotely-deployed key was leaked, then the key-releasing remote party must mark this key as unusable, and never release this key again. Also, this seems to be a completely different problem from what we're discussing in this issue.

But I think it requires the same solution - a tool which allows to reencrypt all the files with another key. Or maybe actually not, because this has to happen on the developer's trusted machine, and sealed keys has to be reencrypted on the untrusted machine... This will be very confusing for the users...

DL8 commented 1 year ago

[...] Or actually... Isn't this completely insecure? Two cases:

  • Sealed files: Let's say the attacker managed to leak some old sealing key. They can now plant any file by encrypting it properly and just specifying the old SVN inside the header. Gramine will happily decrypt and use the forged file contents (after migrating it to the new key). It doesn't allow leaking any new data though, but modifying app files is usually enough to get RCE inside it (e.g. by modifying configs).

This scenario sounds like an app issue, not a Gramine issue. In general data from files is untrusted and must be properly validated. I believe the same applies for protected files and perhaps it should be clarified in the documentation[^1].

It may be useful to add an indication that a file is protected by keys with older SVNs, but I'm not sure what's the right way to do it.

[^1]: I believe that's the case regardless of migration: for example, what if the file we are currently reading is modified under our feet by another instance of the same enclave?

First of all, that's why we'll introduce the new manifest sub-option allow_migration = true, optional for each encrypted FS mount. This will give some flexibility for the manifest developer.

Second of all, if the old sealing key is leaked, then can we do any better than the proposed scheme? To me, the only way to protect from this attack is to disallow completely the migration...

While I am not against configurable migration support, I don't think it's very useful to disable it: since microcode updates are not necessarily under the user's control (e.g. in the cloud), allow_migration = false practically means "every time the enclave starts, access to this protected filesystem is not guaranteed".

I think there is nothing else we can do but go with @lejunzhu's proposal:

  1. Keep the CPUSVN with which the file was encrypted in per-file metadata.
  2. If the file is MRENCLAVE- or MRSIGNER-based (i.e. using SGX sealing key, not user-supplied key), and we notice a mismatch between current CPUSVN and the file CPUSVN, then we trigger migration: a. Call EGETKEY to get a file-CPUSVN sealing key. b. Decrypt file contents with this old key into memory. c. Overwrite this file with newly encrypted (with a new key) file contents. d. Update file metadata with a new CPUSVN. e. Throw away the old key.

Why not add ISV SVN to the per-file metadata and apply the same logic to it?

mkow commented 1 year ago

This scenario sounds like an app issue, not a Gramine issue. In general data from files is untrusted and must be properly validated. I believe the same applies for protected files and perhaps it should be clarified in the documentation1.

Hard disagree, it's just not true in real world for any serious app, they trust project files and configs. The whole point of protected and trusted files in Gramine is to provide a way to deploy filesystem which is trusted and can't be modified by the untrusted host. Some examples:

So, we can't just say that the app should not trust files on disk.

lejunzhu commented 1 year ago
  1. run the app once and lazy migrate the files by accessing them

But the problem here is that only files accessed in that particular run will be upgraded, potentially missing some.

That's right. Maybe Gramine can walk through the encrypted mount points and try to migrate every file when in migration mode?

Yes, the enclave author can migrate all the files eagerly and then ship an enclave which refuses to use the old-SVN-based keys.

This is especially hard if the file is encrypted using MRENCLAVE based keys. I don't know if it's even possible.

But I think it requires the same solution - a tool which allows to reencrypt all the files with another key. Or maybe actually not, because this has to happen on the developer's trusted machine, and sealed keys has to be reencrypted on the untrusted machine... This will be very confusing for the users...

For ra-deployed keys, the re-encryption (or simply changing the key) can also happen on the untrusted machine, if the tool can run inside an enclave, and can temporarily retrieve both the old and new key from remote key server(s).

dimakuv commented 1 year ago

@dimakuv: I meant this particular point only for RA-deployed keys, sorry, I wasn't clear here, there are actually 3 different cases we need to consider (mrsigned-sealed, mrenclave-sealed, ra-deployed). But yeah, I missed the problem with MRENCLAVE-sealed migration. The migration logic has to be part of Gramine then... But how to force the upgrade of all the files? Maybe a special startup flag? And then after migration you upgrade the enclave to a version which would refuse to use the old SVN keys.

What do you mean by "after migration you upgrade the enclave to a version ..."? In which sense you upgrade the enclave, how exactly? Also, if we upgrade the enclave, then the MRENCLAVE measurement is changed, and the new enclave will refuse to read MRENCLAVE-encrypted files (no matter which CPU SVN).

Second of all, if the old sealing key is leaked, then can we do any better than the proposed scheme? To me, the only way to protect from this attack is to disallow completely the migration... Yes, the enclave author can migrate all the files eagerly and then ship an enclave which refuses to use the old-SVN-based keys.

Same question -- what do you mean by "ship an enclave ..."? Any new enclave makes old MRENCLAVE-encrypted files broken (even if they were migrated to a newer CPU SVN).


In general, I'm struggling to see how MRENCLAVE-encrypted files can be safely migrated to a new CPU SVN, and still protect against the outlined attack of "the attacker managed to leak some old sealing key."

mkow commented 1 year ago

Hmm, right, MRENCLAVE-sealed files are tough. Seems like the users just need to delete everything and redeploy the updated enclave in that case...

dimakuv commented 1 year ago

Hmm, right, MRENCLAVE-sealed files are tough. Seems like the users just need to delete everything and redeploy the updated enclave in that case...

Oh-oh, are we back to square one? Any other ideas how to fix the MRENCLAVE problem? (Reminder: MRENCLAVE-sealed CPUSVN-out-of-date files may be triggered for update by an attacker who previously stole the old-CPUSVN sealing key, and thus the attacker can inject random data using this to-be-updated file.)

dimakuv commented 1 year ago

Bumping this thread again, as there is another customer who hit this problem.

szymek156 commented 2 months ago

Bumping for myself, since I also faced it...

dimakuv commented 2 months ago

@szymek156 Very quick answer: we discussed this problem several times, and each time we came to a conclusion that there is no simple and secure way of performing migration of files in this case.

If you can contribute to our discussions, please propose your solution.

szymek156 commented 1 month ago

@dimakuv What about this solution:

(assuming all SVNs information is in the metadata of a file, as it is in SGX SDK)

  1. Gramine during startup reads protected files (has mounts in the manifest so knows where to look)
  2. Gramine reads the metadata of a file e.g. state.txt
  3. Notices CPUSVN bumped or any other SVN of current enclave bumped
  4. Prepares sealed key old_encryption_key from SVNs from metadata of the file, and opens a file state.txt
  5. Prepares sealed keys current_encryption_key from current SVNs and opens file state.txt.new with that key
  6. Reads content of state.txt and writes to state.txt.new
  7. Moves state.txt.new into state.txt
  8. Then app starts up and works as usual

Where old_encryption_key is mr_signer, but with old SVNs And current_encryption_key is mr_signer, but with current SVNs

Add to manifest: allow_to_upgrade_encrypted_files = true (default false if you feel that's a better option for whatever reason)

Add to documentation: Explanation why this is needed (if CPUSVN or any other SVN gets bumped you lost your data) starting enclave may be slow at first (migration), but then will be faster.

Alternatively: do this, but lazy, when enclave wants to open a file