Closed daviddrysdale closed 2 years ago
Note we already have https://github.com/project-oak/oak/issues/815, which in turn is blocked on https://github.com/project-oak/oak/issues/822 , which I think should be a pre-requisite to this issue (feel free to re-prioritize them accordingly).
It feels like it could be implemented using the existing gRPC client pseudo-node and not much more, although storing data in that way would require a sufficiently relaxed label on the data being processed, i.e. it must "flow to" the TLS domain of the storage endpoint.
I was assuming that storage functionality would still require some kind of in-Oak encryption / decryption to allow labelled (plaintext) data to be stored (as unlabelled ciphertext), as was done in the previous C++ implementation, and that doesn't seem to be covered by #815 or #822 (but was mentioned in #727).
Of course, with cryptography comes keys, and with keys comes key management and a PKI, as per #745.
I used to think that doing the encryption from Oak Runtime would be sufficient, but while thinking about IFC, I convinced myself otherwise.
For instance, say a Node has a high (non-bottom) confidentiality label, and has observed a bit of confidential information; if we allow it to store anything, even if encrypted, it can choose whether or not to store it, which will cause some network activity, which can be used as a covert channel to exfiltrate the confidential bit. And I have similar reservations regarding the distributed runtime. Basically I think anything that relies on network outside of Oak can only ever carry public data. The alternative is to generate constant cover traffic to the storage server (or to a remote runtime) so that it becomes hard / impossible to detect covert channels.
In short, I think we need to think more about this before jumping into the implementation of the encryption logic.
@aferr is this a problem that is already studied in IFC?
In fact, even the gRPC client approach presumably already has the same problem: even if we constrain it to a specific TLS domain, the fact of whether or not there is data being sent over the network should be considered a public channel.
Just to get the conversation back on track: let's generalize it to a distributed runtime in which different instances are expected to exchange non-public data over network, which is public. I suspect that even with encryption, the mere presence of network activity would leak.
On the other hand, perhaps a mechanism similar to a call gate could be used, so that a node needs to specify upfront that it is going to send out some labelled data, and then it is forced to do so, or the runtime will send a dummy packet instead if it fails to do it.
(I still plan to respond, but haven't been able to get to it yet :) sorry )
@tiziano88 isn't network just one of many observable variables that might leak information? e.g. cpu use could also be a function of a secret, even if the response over the network is public (or secret, but fixed size), and it might be observed directly at the host system level (how much CPU did the enclave consume) or via network latency.
that's not a reason to dismiss it, but it seems to me that we want to model threats from these kinds side channels separately, and that storage has aspects of both. e.g. do we want to protect against someone observing storage requests or just someone with access to the disk after the fact?
one reason to treat those side channels separately is that we'll likely have to resort to statistical means and maybe even assuming mixed loads on the system (i.e. real and maybe artificial noise).
@seefeldb I agree with you, but I think we have to draw the line somewhere, and my suggestion is to draw the line around things that happen inside the CPU / Memory; specifically, allowing network as an exploitable covert channel seems too dangerous to me, since it's trivially exploitable, not only locally ("make the network card LED blink if you see this bit of private data"), but also remotely ("send a TCP packet to this IP address if you see this bit of private data").
To be clear, open network access to any IP address is also very different from say observing the network between two enclaves within a datacenter. The latter is still more observable than cpu and memory, but those are all increasing circles of concerns. My suggestion is to make such a differentiation within storage as well.
On Tue, Jun 2, 2020, 03:59 Tiziano Santoro notifications@github.com wrote:
@seefeldb https://github.com/seefeldb I agree with you, but I think we have to draw the line somewhere, and my suggestion is to draw the line around things that happen inside the CPU / Memory; specifically, allowing network as an exploitable covert channel seems too dangerous to me, since it's trivially exploitable, not only locally ("make the network card LED blink if you see this bit of private data"), but also remotely ("send a TCP packet to this IP address if you see this bit of private data").
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/project-oak/oak/issues/1040#issuecomment-637460387, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACN27X25XGMNTMB4M5QKDVTRUTLPJANCNFSM4NM6HMLQ .
@tiziano88 Responding in regard to the network side channels you brought up, and also some thoughts about storage.
tldr
I think this discussion conflates two issues that can be thought about and solved separately, though they are useful issues to solve.
Implicitly in the setup you described, the persistent storage is on a separate machine from the high-labeled worker node. This is indeed a covert channel because even if the worker node and storage are both secret, any public node can see whether or not there is a message in the network. So the secret worker can modulate a message by choosing to send or not send something over the network (at all, to specific nodes, probably other options here).
However, you have this same covert channel even without adding storage. It just requires Oak to be a distributed system.
At the same time, there are also potential security issues if we add storage, regardless of whether or not Oak is distributed.
The main difference between Oak with and without persistent storage is that now data can be stored on disk and not just in memory. For this reason, it is useful to separate between two different threat models: one in which we care about physical attacks and one in which we do not.
Aside from addressing issues introduced by moving data off-chip, compatibility with the labels in Oak requires associating labels with storage locations. There are interesting tradeoffs here, but this does not actually appear to be the most pertinent topic to discuss just yet, so this is not discussed.
Because storage can be considered separately from whether or not Oak is a distributed system or runs all on one machine. Here I am assuming everything runs on one machine, though the problems/solutions here still apply when Oak is a distributed system.
(Here Physical Attacks = Attacks that can only be done by attackers in the same room as the machine running an Oak Node)
Clearly physical attacks are more difficult than remote ones. There are also use-cases we can meet even by assuming the machine running Oak and everyone in the room with it is trusted. Namely, if Oak is running in a trusted datacenter.
By only storing node data on disk if it is encrypted, and only allowing Oak to have access to the encryption keys, the node data on disk can only be accessed by:
If the operating system is compromised, it really does not matter if the data on disk is encrypted, because the OS can read the node data from memory. The same is true if another app compromises the OS.
At the same time, if the OS is trusted (assumed perfect!), we actually don't need encryption. We just need process isolation and isolation by the file system. If these work correctly, it does not matter if the data is encrypted or not. Conventional OSes are not to be trusted, but we are already undergoing the monumental effort to establish trust in low-level software. Proving correctness, noninterference, etc. would be one way to establish trust in the operating system.
Enclaves are almost an "alternative" here in the sense that they offer a smaller TCB than the OS (so it is more reasonable to trust an enclave manager axiomatically), but I think the filesystem also needs to be trusted.
At a high-level, I don't think encrypting the data stored in disk actually changes the threat model that you defend against. The thing you actually need is to trust low-level software. You are already trusting the same low-level software even without storage. I think Oak could be extended with persistent storage while still having parity with the current threat model (the threat model is not degraded).
We are thinking about using Oak for IOT, and in this case attackers are certainly in the room with Oak devices. There are a ton of new attacks to consider here, and this piece of text will not address all of them. But it is still useful to think about the easiest and most obvious attacks (e.g., that can be carried out by attackers that do not have specialized equipment like focused ion beams and are also not going to infiltrate the fabrication facility where your hardware was constructed).
Encrypting data on disk does stop the most obvious physical attack.
The second easiest physical attack is to have a malicious DRAM device. Clearly we need memory encryption.
Memory encryption is still not enough because attackers can still gain information from the memory access pattern. Oblivious RAM may be the answer. It's expensive, but if I recall correctly Path ORAM is a significant advancement in terms of cost (there is also related work that builds on this, I also do not know this space extremely well).
Addressing this network covert channel will also be important for the case where Oak is a distributed system. Having dummy transactions is probably the start to one good way of addressing this problem.
However, I don't think fixing the number of dummy transactions to a call gate (a function coupled with a security label at which the function executes) will help. The node can still modulate secrets by deciding whether or not it uses the call gate.
From the perspective of nodes that are not receiving messages, the outgoing network traffic looks the same.
*As a slight variation to help with bootstrapping: nodes could have a sealed/unsealed bit. Unsealed: write channels can be added, messages cannot be sent. Can be sealed Sealed: write channels can no longer be added. messages can be sent as above. Can not be unsealed.
The existing storage pseudo-Node implementation in C++ was removed by #1016 as part of the move to a Rust-only Runtime. However, we are still going to need some kind of storage functionality.