share / sharedb

Realtime database backend based on Operational Transformation (OT)
Other
6.24k stars 453 forks source link

can the documents be encrypted somehow? #211

Open LukasMeine opened 6 years ago

LukasMeine commented 6 years ago

Is there a way to encrypt the documents used by sharedb? thank you.

gkubisa commented 6 years ago

AFAIK ShareDB itself does not support encryption, however, you can still encrypt its database (usually MongoDB, https://www.mongodb.com/blog/post/at-rest-encryption-in-mongodb-3-2-features-and-performance) or the entire file system.

brainkim commented 6 years ago

I got to thinking about this problem today while walking the dog today. I am not a cryptography expert so be warned. Here’s a dump of my thoughts.

If by encrypting documents, you mean encrypting the snapshots and ops at rest, there are a lot of simple built-in database solutions out there that you can use. If you mean end-to-end encryption, where only clients with a certain key can read the contents of a snapshot/understand incoming trnasformations, and where neither the server nor database admins can see the data of snapshots and operations, the problem is that ot servers need to know about the contents of snapshots to apply and transform operations. So, for instance, you couldn’t apply a text operation to a cipher text b/c you wouldn’t know where the indices of operations point to in the cipher text. So it’s fundamentally a problem to do with the client-server architecture of sharedb.

I think there are two solutions that you can try.

  1. You could use a substitution cipher to encrypt ops and snapshots, so that indices are preserved in the cipher-text and operations. This is pre-Jesus technology which is definitely not secure in this era, but it could be used on top of encryption at rest to mainly dissuade a nosy db admin from reading select * from "snapshot", or having private documents read/logged by accident. You could also implement a fun UI on top of this where until a user decrypts the document on the client they see the cipher text, with the garbled text updated based on concurrent edits happening elsewhere, and then doing that Sneakers-style dissolve of encrypted text into plaintext. (https://github.com/bartobri/no-more-secrets)
  2. You could create a sharedb server instance on a single client and have that client be responsible for transforming concurrent edits, and have the server primarily reroute/broadcast encrypted operations/snapshots from the client. I don’t really know if this is possible.
alecgibson commented 1 year ago

Just because @curran has mentioned this again, here are my thoughts (for whatever they're worth):

This could potentially be handled by middleware? I've not tried this at all, but something along these lines might work?

You could encrypt snapshot in the commit hook, just before they're written to the database, and decrypt them in readSnapshots:

backend.use('commit', (request, next) => {
  request.snapshot = encrypt(request.snapshot);
  next();
});

backend.use('readSnapshots', (request, next) => {
  request.snapshots = request.snapshots.map((snapshot) => decrypt(snapshot));
  next();
});

You could do similar for ops in commit and op hooks, and probably also for Milestone snapshots.

NB: This is NOT end-to-end encryption. For OT to work, ShareDB will always need the ops/snapshots to be decrypted in-memory (as discussed in above comments).

If the above code works (which is a big "if", given that I've not tried it), it would keep information decrypted in-memory, where ShareDB can do its thing, but encrypted in the DB. It would even have the benefit of keeping ops encrypted over Pub/Sub (I think this should "just work"...?!).

The snapshot and op metadata "structure" would remain in-tact and queryable, so all of ShareDB's machinery would still work, but the actual "contents" of the op (ie snapshot.data and op.op would be scrambled). Note that if any other bits of your app rely on querying the actual data field, this would break (and would also probably break ShareDB queries). For actual ShareDB operation, I think scrambling these fields should be fine (apart from queries), since ShareDB should be type-agnostic, and therefore it shouldn't make any assumptions on the shape or contents of these fields anyway.

One slightly fiddly bit might be key cycling: your decryption method might need to handle shims for an old key, so that you can re-encrypt your snapshot, and still fetch ops encrypted with the old key in order to catch up old clients. This shim would need to be live for as long as you want clients to fetch back to. If you're using document history (historic snapshot fetches), again you would only be able to cycle your keys as frequently as you want users to fetch back to. If that's "all time", you can maybe never cycle your key, which slightly defeats the security aspect of this all.

michael-brade commented 1 year ago

Hi, I just was notified of this discussion. I have not spent any time with the theory, I just want to mention that there is cryptpad (https://github.com/xwiki-labs/cryptpad/blob/main/docs/ARCHITECTURE.md), they do end-to-end encryption but maybe they are not using OT. Maybe chainpad could be used/integrated...?