nucypher / zerodb

*This project is no longer actively maintained. If you'd like to become the maintainer, please let us know.* ZeroDB is an end-to-end encrypted database. Data can be stored and queried on untrusted database servers without ever exposing the encryption key. Clients can execute remote queries against the encrypted data without downloading all of it or suffering an excessive performance hit.
GNU Affero General Public License v3.0
1.56k stars 102 forks source link

Methods of access pattern attacks mitigation #9

Open michwill opened 8 years ago

michwill commented 8 years ago

In ZeroDB, we do leak access patterns which could reduce security to that of CryptDB for an observer who watches access patterns over infinite time. Here are proposals of how to mitigate that.

FWIW, here are a couple relevant papers from a quick search

Jamey Sharp from twitter

What are the most efficient ORAMs now? If you assume non-colluding servers, you may find "Multi-Cloud Oblivious Storage" by Emil Stefanov and Elaine Shi interesting. If you have only one server, you may want to look at our work on Onion ORAM and a follow-up worked called C-ORAM. Be careful with the latter though. It improved upon our construction, but it has bugs and omitted many optimizations and details.

Ling Ren and others from MIT

If you have a good way to measure the leakage, then the solution is to just re-shuffle the database when the leakage approaches some limit (with traditional ORAM, the reshuffling is done in conjunction with other design elements to eliminate any leakage; but, I think there is an interesting design point to look for solutions that are permitted to leak some limited amount of information).

Dave Evans from University of Virginia

The last one assumes that it is, at least, important to have an estimate of leakage first. May be some minor pattern leakage can be permitted (until data a re-shuffled). Rate of natural leakage mitigation by, effectively, splitting buckets of B-Trees while using the DB is also interesting to know

HarryR commented 8 years ago

The additional overhead of ORAM in comparison to the current access scheme is significant enough that enabling it by default may hinder performance too much in comparison with the current implementation.

When I was researching this, in addition to investigating other ways to persist data, I realised that the first step to making this possible is to abstract the storage layer so that it interacts with a simpler key/value store interface.

At the moment I don't think there's a clean enough layer between the DB and storage layers which makes verifying wire contents or changing the backend storage persistence service less easy.

By there not being a clean enough distinction between the database query engine and the storage layer I mean that I've still not found a clean point of separation where I can say 'Everything up until this point is the ZeroDB specific query logic, and everything beyond this point is merely an interface for persisting and retrieving data' - finding this point of separation would allow me to log all operations performed over the network, and to very quickly write additional storage adapters like one for Redis, MongoDB, CouchDB or my own ZeroDB project which provides a ZeroMQ interface to several Key/Value storage engines.

michwill commented 8 years ago

I'd say, there is a separation. ZeroDB uses ZEO storage (can use neoppod instead).

One difference compare to a usual key-value storage though is local cache and ways of invalidating that when somebody commits new data. There are ways to use ZEO with any storage having a rather thin layer adapter (e.g. you can plug in relational database as a storage through relstorage, not hard to plug in any block storage the same way starting with DemoStorage as an example).

So, there is still some extra code on the server for invalidations (one we currently inherit from ZEO). Also an extra bit we need is ability to load many blocks in one request (see loadBulk method), but that is something many storages already have.

Another (completely different) way of doing things (not what we currently do!) would be to have query logic on the server, but sometimes asking client to do critical simple operations (e.g. server sends encrypted bucket to the client + encrypted keyword, client decrypts that, figures out what bucket id to request next, tells the server). This way, client would be much thinner. And the server would consist of "DB engine" which consults the client when necessary + the block storage.

michwill commented 8 years ago

@HarryR I also think that most of your comment applies to this issue. Linking it for my own reference :-)