Open michwill opened 8 years ago
The additional overhead of ORAM in comparison to the current access scheme is significant enough that enabling it by default may hinder performance too much in comparison with the current implementation.
When I was researching this, in addition to investigating other ways to persist data, I realised that the first step to making this possible is to abstract the storage layer so that it interacts with a simpler key/value store interface.
At the moment I don't think there's a clean enough layer between the DB and storage layers which makes verifying wire contents or changing the backend storage persistence service less easy.
By there not being a clean enough distinction between the database query engine and the storage layer I mean that I've still not found a clean point of separation where I can say 'Everything up until this point is the ZeroDB specific query logic, and everything beyond this point is merely an interface for persisting and retrieving data' - finding this point of separation would allow me to log all operations performed over the network, and to very quickly write additional storage adapters like one for Redis, MongoDB, CouchDB or my own ZeroDB project which provides a ZeroMQ interface to several Key/Value storage engines.
I'd say, there is a separation. ZeroDB uses ZEO storage (can use neoppod instead).
One difference compare to a usual key-value storage though is local cache and ways of invalidating that when somebody commits new data. There are ways to use ZEO with any storage having a rather thin layer adapter (e.g. you can plug in relational database as a storage through relstorage, not hard to plug in any block storage the same way starting with DemoStorage as an example).
So, there is still some extra code on the server for invalidations (one we currently inherit from ZEO). Also an extra bit we need is ability to load many blocks in one request (see loadBulk method), but that is something many storages already have.
Another (completely different) way of doing things (not what we currently do!) would be to have query logic on the server, but sometimes asking client to do critical simple operations (e.g. server sends encrypted bucket to the client + encrypted keyword, client decrypts that, figures out what bucket id to request next, tells the server). This way, client would be much thinner. And the server would consist of "DB engine" which consults the client when necessary + the block storage.
In ZeroDB, we do leak access patterns which could reduce security to that of CryptDB for an observer who watches access patterns over infinite time. Here are proposals of how to mitigate that.
Jamey Sharp from twitter
Ling Ren and others from MIT
Dave Evans from University of Virginia
The last one assumes that it is, at least, important to have an estimate of leakage first. May be some minor pattern leakage can be permitted (until data a re-shuffled). Rate of natural leakage mitigation by, effectively, splitting buckets of B-Trees while using the DB is also interesting to know