Open dimakuv opened 5 months ago
A few additional thoughts
besides the basic check that if the gramine instance has seen earlier a particular file (and in that case ensures that the file re-opened is consistent with the last version of the file seen (by comparing the files root-ghash with the hashmaps ghash), one might also want to enforce (maybe gated by a configuration parameter?) that on an open without CREAT flag, we abort (rather than just return error to application) if the hashmap shows a file should exist but the (untrusted) host doesn't provide one. Related to this, a delete operation should also result in the removal of the corresponding record in the hashmap.
if one keeps the hashmap on a per-mount basis, one would also lay the foundation for a quite useful "volume" concept which would bundle together multiple files and their versions. Irrelevant for the immediate step of firming up rollback during the lifetime of an instance, but as part of the next step of making rollback persistent, this would be useful for use-cases such as file-based workflow systems (e.g., WDL or CWL based ones) where each step in the flow runs as a gramine instance and generated file(-sets) of one step will be input to the subsequent step. Similarly, such a structure could eventually also be used to persist file meta-data such as owners, permission and alike and be useful to eventually support hard-links?
I think we don't do that because it's not trivial with multiple processes - the file may be modified by another process under the same Gramine instance/namespace, which won't refresh the map in other processes. We could add IPC for that, but it will be pretty complex, same as it is for fixing all the other shortcomings we took to avoid complex IPC interactions in other subsystems.
I think we don't do that because it's not trivial with multiple processes - the file may be modified by another process under the same Gramine instance/namespace, which won't refresh the map in other processes. We could add IPC for that, but it will be pretty complex, same as it is for fixing all the other shortcomings we took to avoid complex IPC interactions in other subsystems.
Ideally of course having rollback protection cover also multi-process would best. However, even a single-process implementation will improve security, so i think there is value as such a first step. (E.g., also having persistent rollback protection also would be great but clearly also asks for staging?)
To go to multiple processes, i think there are two cases: (a) where the different processes access the files purely serialized, likely controlled by some "whole-file-lock" and (b) where different processes access files concurrently, with conflict reduced prevented using file-region based locking. (a) at least conceptually be similar in complexity to other IPC-based hand-shakes? The really tough case is (b), but i'm not sure whether that is even currently supported. At least from starring at the code i don't see how two processes would reconcile the concurrent meta-data changes (merkle-tree).
However, even a single-process implementation will improve security, so i think there is value as such a first step.
But how do you want to design it for single process to not break multi-process scenarios? By disabling it on the first fork?
To go to multiple processes, i think there are two cases: (a) where the different processes access the files purely serialized, likely controlled by some "whole-file-lock" and (b) where different processes access files concurrently, with conflict reduced prevented using file-region based locking. (a) at least conceptually be similar in complexity to other IPC-based hand-shakes? The really tough case is (b), but i'm not sure whether that is even currently supported. At least from starring at the code i don't see how two processes would reconcile the concurrent meta-data changes (merkle-tree).
I think only (a) is viable, but it's still tricky, complex IPC in C is not trivial to write, especially in Gramine environment. And just a global lock isn't enough, you also need this global hash map. It's possible, but we'd really need a more global rethink/rewrite to centralize more state in the main process and keep that process alive. See e.g. issues like https://github.com/gramineproject/gramine/issues/21.
However, even a single-process implementation will improve security, so i think there is value as such a first step.
But how do you want to design it for single process to not break multi-process scenarios? By disabling it on the first fork?
I was planning to add some configuration parameter which allows from having a strict vs non-strict version (the former essentially starting from empty volume) and so adding a third disable option would address your case where a multi-process application really changes a single file in multiple processes. Given how brittle the multi-process support is with protected files (e.g., no case (b) even now) i think that should do for an initial step and then the multi-process case might require further broader discussion?
However, even a single-process implementation will improve security, so i think there is value as such a first step.
But how do you want to design it for single process to not break multi-process scenarios? By disabling it on the first fork?
I was planning to add some configuration parameter which allows from having a strict vs non-strict version (the former essentially starting from empty volume) and so adding a third disable option would address your case where a multi-process application really changes a single file in multiple processes. Given how brittle the multi-process support is with protected files (e.g., no case (b) even now) i think that should do for an initial step and then the multi-process case might require further broader discussion?
I agree with @g2flyer that a single-process support will be a good first step. Indeed, this may be sufficient for 99% of current workloads.
As for how to disable Encrypted Files, I suggest the following (we use it in other places too):
inode->data = NULL
, we already have code that handles this case as "encrypted file has no correct representation".This will allow forking in general case, but if some encrypted file was opened during fork, it will become tainted and invalid. The child won't be use it.
I don't think it works if the child re-opens the file by itself instead of inheriting a description?
I don't think it works if the child re-opens the file by itself instead of inheriting a description?
You are right. We need some filepath -> is_opened
shared state.
I think a simple workaround would be:
/tmp/gramine/
with the name that is e.g. sha256(filepath)
on encrypted-file openMaybe there's something better than such a temp file, but I think that's a sufficiently simple to be implemented.
UPDATE: How will this work with symlinks of the same file?
As for how to disable Encrypted Files, I suggest the following (we use it in other places too):
- On fork, in a "checkpoint" function of an encrypted-file object, mark the object as tainted.
- Enough to have
inode->data = NULL
, we already have code that handles this case as "encrypted file has no correct representation".This will allow forking in general case, but if some encrypted file was opened during fork, it will become tainted and invalid. The child won't be use it.
Is there really value for the case where we protect leader but not child (regardless of whether above even technically works)? My approach would have been more by default replicate state on clone/fork which should work unless leader and/or childs read (and at least one, writes) files after fork/clone (and do so with non-overlapping open-file periods as otherwise this doesn't work even right now!). For that particular case, i would have for now just suggested to completely disable the feature in the manifest and in all processes: Such a case should be easily be detectable during testing and completely disabling this feature doesn't seem to provide any real security drawback over doing it only for children?
Maybe we should discuss it during the next Tuesday's meeting? @g2flyer Could you attend this meeting? It is at 7am Portland time, if I'm not mistaken (it's definitely 4pm German time).
Maybe we should discuss it during the next Tuesday's meeting? @g2flyer Could you attend this meeting? It is at 7am Portland time, if I'm not mistaken (it's definitely 4pm German time).
Not necessarily my favorite time of the day but i definitely can make it. But you would have to send me the invite as i can't find information of this meeting on github and linked resources.
BTW: from perusing the SIGY paper released yesterday on arXiv, seems to me their specific nginx-reset-to-old-config-file-attack wouldn't have worked with the feature discussed here?
BTW: from perusing the SIGY paper released yesterday on arXiv, seems to me their specific nginx-reset-to-old-config-file-attack wouldn't have worked with the feature discussed here?
Did it work on Gramine? I thought it was never possible to inject SIGHUP
? (but we're getting to a rather offtopic here)
BTW: from perusing the SIGY paper released yesterday on arXiv, seems to me their specific nginx-reset-to-old-config-file-attack wouldn't have worked with the feature discussed here?
I think this is off-topic, so if wanted, please move the discussion in a new GitHub Discussion.
From my side:
Did it work on Gramine? I thought it was never possible to inject
SIGHUP
? (but we're getting to a rather offtopic here)
Sorry, did not read paper careful enough : nginx attack worked only on scone, not gramine. Got confused perusing paper as they initially mention gramine in context of nginx attack, it was a rollback attack and they also mention later that gramine does not have rollback protection.
BTW: issue and mitigation design was discussed in 23. April 2024 Community Call
Description of the feature
Currently Encrypted Files are protected against rollback attacks only at the timeframe of the encrypted file being opened. After all FDs to the encrypted file are closed, the handle itself is closed and all metadata about the encrypted file is null. At that point a rollback attack can be applied, even though Gramine instance is still running. A next
open()
of the same file would read the stale version.An easy fix is to have a hashmap with
<enc-file-path: root hash>
and update this map onclose()
. And onopen()
, validate the hashmap. There are corner cases to take into account, likerename()
(need to delete the old hashmap entry and create a new one) orunlink()
.Then the next step (if ever) would be to store this hashmap in some persistent storage like an external CCF database. Not for near future, but maybe we should think about it for the roadmap.
NOTE: We also don't have any mentions in the documentation about rollback/replay.
Why Gramine should implement it?
The current rollback protection is unintuitive for end users. The expectation is at least to detect stale replicas during Gramine execution.