[common] Encrypted Files: keep root hash for whole runtime of Gramine instance

dimakuv commented 5 months ago

Description of the feature

Currently Encrypted Files are protected against rollback attacks only at the timeframe of the encrypted file being opened. After all FDs to the encrypted file are closed, the handle itself is closed and all metadata about the encrypted file is null. At that point a rollback attack can be applied, even though Gramine instance is still running. A next open() of the same file would read the stale version.

An easy fix is to have a hashmap with <enc-file-path: root hash> and update this map on close(). And on open(), validate the hashmap. There are corner cases to take into account, like rename() (need to delete the old hashmap entry and create a new one) or unlink().

Then the next step (if ever) would be to store this hashmap in some persistent storage like an external CCF database. Not for near future, but maybe we should think about it for the roadmap.

NOTE: We also don't have any mentions in the documentation about rollback/replay.

Why Gramine should implement it?

The current rollback protection is unintuitive for end users. The expectation is at least to detect stale replicas during Gramine execution.

g2flyer commented 5 months ago

A few additional thoughts

besides the basic check that if the gramine instance has seen earlier a particular file (and in that case ensures that the file re-opened is consistent with the last version of the file seen (by comparing the files root-ghash with the hashmaps ghash), one might also want to enforce (maybe gated by a configuration parameter?) that on an open without CREAT flag, we abort (rather than just return error to application) if the hashmap shows a file should exist but the (untrusted) host doesn't provide one. Related to this, a delete operation should also result in the removal of the corresponding record in the hashmap.
if one keeps the hashmap on a per-mount basis, one would also lay the foundation for a quite useful "volume" concept which would bundle together multiple files and their versions. Irrelevant for the immediate step of firming up rollback during the lifetime of an instance, but as part of the next step of making rollback persistent, this would be useful for use-cases such as file-based workflow systems (e.g., WDL or CWL based ones) where each step in the flow runs as a gramine instance and generated file(-sets) of one step will be input to the subsequent step. Similarly, such a structure could eventually also be used to persist file meta-data such as owners, permission and alike and be useful to eventually support hard-links?

mkow commented 5 months ago

I think we don't do that because it's not trivial with multiple processes - the file may be modified by another process under the same Gramine instance/namespace, which won't refresh the map in other processes. We could add IPC for that, but it will be pretty complex, same as it is for fixing all the other shortcomings we took to avoid complex IPC interactions in other subsystems.

g2flyer commented 5 months ago

I think we don't do that because it's not trivial with multiple processes - the file may be modified by another process under the same Gramine instance/namespace, which won't refresh the map in other processes. We could add IPC for that, but it will be pretty complex, same as it is for fixing all the other shortcomings we took to avoid complex IPC interactions in other subsystems.

Ideally of course having rollback protection cover also multi-process would best. However, even a single-process implementation will improve security, so i think there is value as such a first step. (E.g., also having persistent rollback protection also would be great but clearly also asks for staging?)

To go to multiple processes, i think there are two cases: (a) where the different processes access the files purely serialized, likely controlled by some "whole-file-lock" and (b) where different processes access files concurrently, with conflict reduced prevented using file-region based locking. (a) at least conceptually be similar in complexity to other IPC-based hand-shakes? The really tough case is (b), but i'm not sure whether that is even currently supported. At least from starring at the code i don't see how two processes would reconcile the concurrent meta-data changes (merkle-tree).

mkow commented 5 months ago

However, even a single-process implementation will improve security, so i think there is value as such a first step.

But how do you want to design it for single process to not break multi-process scenarios? By disabling it on the first fork?

To go to multiple processes, i think there are two cases: (a) where the different processes access the files purely serialized, likely controlled by some "whole-file-lock" and (b) where different processes access files concurrently, with conflict reduced prevented using file-region based locking. (a) at least conceptually be similar in complexity to other IPC-based hand-shakes? The really tough case is (b), but i'm not sure whether that is even currently supported. At least from starring at the code i don't see how two processes would reconcile the concurrent meta-data changes (merkle-tree).

I think only (a) is viable, but it's still tricky, complex IPC in C is not trivial to write, especially in Gramine environment. And just a global lock isn't enough, you also need this global hash map. It's possible, but we'd really need a more global rethink/rewrite to centralize more state in the main process and keep that process alive. See e.g. issues like https://github.com/gramineproject/gramine/issues/21.

g2flyer commented 5 months ago

However, even a single-process implementation will improve security, so i think there is value as such a first step.

But how do you want to design it for single process to not break multi-process scenarios? By disabling it on the first fork?

I was planning to add some configuration parameter which allows from having a strict vs non-strict version (the former essentially starting from empty volume) and so adding a third disable option would address your case where a multi-process application really changes a single file in multiple processes. Given how brittle the multi-process support is with protected files (e.g., no case (b) even now) i think that should do for an initial step and then the multi-process case might require further broader discussion?

dimakuv commented 5 months ago

However, even a single-process implementation will improve security, so i think there is value as such a first step.

But how do you want to design it for single process to not break multi-process scenarios? By disabling it on the first fork?

I was planning to add some configuration parameter which allows from having a strict vs non-strict version (the former essentially starting from empty volume) and so adding a third disable option would address your case where a multi-process application really changes a single file in multiple processes. Given how brittle the multi-process support is with protected files (e.g., no case (b) even now) i think that should do for an initial step and then the multi-process case might require further broader discussion?

I agree with @g2flyer that a single-process support will be a good first step. Indeed, this may be sufficient for 99% of current workloads.

As for how to disable Encrypted Files, I suggest the following (we use it in other places too):

On fork, in a "checkpoint" function of an encrypted-file object, mark the object as tainted.
Enough to have inode->data = NULL, we already have code that handles this case as "encrypted file has no correct representation".

This will allow forking in general case, but if some encrypted file was opened during fork, it will become tainted and invalid. The child won't be use it.

mkow commented 5 months ago

I don't think it works if the child re-opens the file by itself instead of inheriting a description?

dimakuv commented 5 months ago

I don't think it works if the child re-opens the file by itself instead of inheriting a description?

You are right. We need some filepath -> is_opened shared state.

I think a simple workaround would be:

Create a temporary file under /tmp/gramine/ with the name that is e.g. sha256(filepath) on encrypted-file open
Delete this temporary file on the last encrypted-file close
Also somehow make sure that when the whole Gramine process dies, all these temporary files are removed

Maybe there's something better than such a temp file, but I think that's a sufficiently simple to be implemented.

UPDATE: How will this work with symlinks of the same file?

g2flyer commented 5 months ago

As for how to disable Encrypted Files, I suggest the following (we use it in other places too):

On fork, in a "checkpoint" function of an encrypted-file object, mark the object as tainted.

Enough to have inode->data = NULL, we already have code that handles this case as "encrypted file has no correct representation".

This will allow forking in general case, but if some encrypted file was opened during fork, it will become tainted and invalid. The child won't be use it.

Is there really value for the case where we protect leader but not child (regardless of whether above even technically works)? My approach would have been more by default replicate state on clone/fork which should work unless leader and/or childs read (and at least one, writes) files after fork/clone (and do so with non-overlapping open-file periods as otherwise this doesn't work even right now!). For that particular case, i would have for now just suggested to completely disable the feature in the manifest and in all processes: Such a case should be easily be detectable during testing and completely disabling this feature doesn't seem to provide any real security drawback over doing it only for children?

dimakuv commented 5 months ago

Maybe we should discuss it during the next Tuesday's meeting? @g2flyer Could you attend this meeting? It is at 7am Portland time, if I'm not mistaken (it's definitely 4pm German time).

g2flyer commented 5 months ago

Maybe we should discuss it during the next Tuesday's meeting? @g2flyer Could you attend this meeting? It is at 7am Portland time, if I'm not mistaken (it's definitely 4pm German time).

Not necessarily my favorite time of the day but i definitely can make it. But you would have to send me the invite as i can't find information of this meeting on github and linked resources.

g2flyer commented 5 months ago

BTW: from perusing the SIGY paper released yesterday on arXiv, seems to me their specific nginx-reset-to-old-config-file-attack wouldn't have worked with the feature discussed here?

mkow commented 4 months ago

BTW: from perusing the SIGY paper released yesterday on arXiv, seems to me their specific nginx-reset-to-old-config-file-attack wouldn't have worked with the feature discussed here?

Did it work on Gramine? I thought it was never possible to inject SIGHUP? (but we're getting to a rather offtopic here)

dimakuv commented 4 months ago

BTW: from perusing the SIGY paper released yesterday on arXiv, seems to me their specific nginx-reset-to-old-config-file-attack wouldn't have worked with the feature discussed here?

I think this is off-topic, so if wanted, please move the discussion in a new GitHub Discussion.

From my side:

The Nginx "reload the configuration file" feature is not supported by Gramine currently -- Gramine doesn't allow injecting any signals from the host other than SIGTERM. So SIGHUP is just ignored in Gramine, thus the signal would never be delivered to the Nginx app.
If Gramine would support SIGHUP, then it looks to me that (if the Nginx config file is marked as an encrypted file in Gramine manifest) that "reload the configuration file" feature would be possible and indeed the attack described in the SIGY paper would be possible too.
If the "keep root hash" Gramine feature would be implemented as described in this GitHub issue, then "reload the configuration file" feature would probably become impossible in Gramine? Because Gramine would keep the root hash of the old config file, and the user won't be able to load the new config file (without e.g. explicit logic of file deletion inside Nginx, but I don't think that's what Nginx code does).

g2flyer commented 4 months ago

Did it work on Gramine? I thought it was never possible to inject SIGHUP? (but we're getting to a rather offtopic here)

Sorry, did not read paper careful enough : nginx attack worked only on scone, not gramine. Got confused perusing paper as they initially mention gramine in context of nginx attack, it was a rollback attack and they also mention later that gramine does not have rollback protection.

g2flyer commented 4 months ago

BTW: issue and mitigation design was discussed in 23. April 2024 Community Call

gramineproject / gramine

[common] Encrypted Files: keep root hash for whole runtime of Gramine instance #1835

Description of the feature

Why Gramine should implement it?