mafintosh / hyperdb

Distributed scalable database
MIT License
753 stars 75 forks source link

Key revocation and management feasibility? #55

Open mjp0 opened 6 years ago

mjp0 commented 6 years ago

I’ve been researching hyperdb/core/drive for a project and the last thing I’m stuck at is the key management. Currently hyperdb doesn’t support any sort of management but is this because nobody (aka @mafintosh or @noffle ;) ) hasn’t had time or need to put it in place or is it because there’s some inherent problem with append-only logs and keys that I just don’t see right now?

The two things I’m looking for are limiting the ability to authorize and ability to revoke right to read/write. Without these two it’s quite hard to build anything on top of hyperdb that’s not either fully public or fully trusted.

I understand hyperdb’s architecture at a high-level only right now so I’m hoping that somebody with more knowledge finds few mins shed some light on this :)

hackergrrl commented 6 years ago

limiting the ability to authorize

Maybe it would be useful if hyperdb offered a async predicate function for testing whether a given key's feed is to be considered allowed? If the predicate returns false, the local hyperdb ignores their content or doesn't issue an authorize.

and ability to revoke right to read/write

This isn't completely possible: if I give you some encrypted data and the private key to read it, I can never forcibly remove that data and key from your harddrive. You could use a new key for future data that the revoked party doesn't know about, though. This sounds like a useful mechanism for hyperdb.

My personal preference is to keep the auth mechanism in hyperdb very simple, but offer some hooks (like the authorization predicate function above) that would let modules be written on top of hyperdb that implement their own security models. That keeps hyperdb focused on doing one thing well (p2p key/value data storage) instead of trying to accomodate various security models.

pfrazee commented 6 years ago

@noffle I'm fairly certain hyperdb needs to be opinionated about the management of writers though, because it's a core part of its protocol

emilbayes commented 6 years ago

I think the predicate function is the only sane way to go. Then hyperdb doesn't make any decisions on whether you want to use a web of trust model or a authority model for trusting writers. Readers can only controlled in which peers you allow to connect to you, but like @noffle said, what has been seen cannot be unseen.

Key revocation and management of keys is something that will depend on how peers should establish trust in each other. As protocols a compromised key is a doomsday scenario.

One simple thought experiment that makes key rotation really tricky is; I add a new writer and tell everyone that this is my new identity. How can others be sure that this is true, and not simply a hacker that is trying to steal my db / trying to ruin my reputation?

pfrazee commented 6 years ago

Feeds will need to reference other feeds specifically in their vector-clock timestamps. You'll be fairly heavily constrained by that requirement. Any kind of runtime indirection is a bad idea because it suggests a feed could be remapped without recording the change in the log or preserving the history, and that would break the CRDT. Therefore the key management system can only add and remove writers within the history of the DB, and within that I question whether there's much opportunity for experimentation, because you'll need to maintain the convergence of the CRDT. I wouldn't say it's impossible, but I'd like to see a description of some of those systems before we make key management pluggable. Plugins create a situation where hyperdbs can only be consumed if the plugin used can be identified and loaded by the reader, and I see that as a non-zero cost.

mjp0 commented 6 years ago

Maybe it would be useful if hyperdb offered a async predicate function for testing whether a given key's feed is to be considered allowed? If the predicate returns false, the local hyperdb ignores their content or doesn't issue an authorize.

This was actually pretty much what I had in mind, but I wanted to hear if I'm being dumb and not realizing something really obvious that would prevent this in the data structure itself (as @pfrazee highlights) ;)

This isn't completely possible: if I give you some encrypted data and the private key to read it, I can never forcibly remove that data and key from your harddrive.

Yes, I phrased that badly because what you are saying is obviously true and I meant preventing future messages being read.

One simple thought experiment that makes key rotation really tricky is; I add a new writer and tell everyone that this is my new identity. How can others be sure that this is true, and not simply a hacker that is trying to steal my db / trying to ruin my reputation?

Wouldn't this be solved simply by requiring valid signature for all these sort of update messages?

Therefore the key management system can only add and remove writers within the history of the DB, and within that I question whether there's much opportunity for experimentation, because you'll need to maintain the convergence of the CRDT. I wouldn't say it's impossible, but I'd like to see a description of some of those systems before we make key management pluggable. Plugins create a situation where hyperdbs can only be consumed if the plugin used can be identified and loaded by the reader, and I see that as a non-zero cost.

To me, this feels like functionality that should be baked into hyperdb's core. Maybe as a sort of "mode" which you are running hyperdb keep compatibility detection relatively binary?

There are two levels to this. First is expanding key management to include revocation by simply adding a sort of ignoring function. One concern I have for hyperdb without key revocation is somebody getting an invite and spamming the db. With ignoring you could make decrease the reward for doing that because your spam would go to /dev/null. The predicate function seems the easiest solution here but what I'm not sure is that does CRDT go haywire if some participants ignore some participants, so they essentially have different references available?

The second level is the issue that anyone authorized can authorize other people. What I'm worried about is spying. Imagine you would use hyperdb as a p2p slack inside your company. What happens if somebody's machine gets hacked and their key is used to authorize a spy. Unless you are filtering everything through a firewall, it's likely that you won't even know that somebody is listening. I'm not sure if this can be resolved.

xloem commented 6 years ago

I think this is important because append-only logs last forever, and given infinite time all keys can be compromised.

Here's what my frazzled brain has come up with so far. Any comments? Do you think this would work?

  1. Let the user provide a single predicate which decides whether or not messages fit within its policy. This is called for all messages, and allows for arbitrary security models.
  2. Allow a writer to deauthorize other writers, stating to ignore information in the given feed beyond the provided sequence number. This provides for key revocation.
  3. Allow a writer to mutiny and take ownership of the database, becoming the new source for it. This provides a mechanism to handle source key compromise (and forking !).
  4. When contentFeed is lengthened, make the writer doing so broadcast the new state, and only consider content valid at lengths included within the latest broadcast. This ensures a deauthorized key cannot continue to update their content feed.

There's the edge case where a few people perform work on top of updates from a feed that end up having been made after somebody else has deauthorized the feed, due to communication delay.

I considered a few approaches to that edge case, and I think the best way to handle it would be to treat deauthorization updates as simply large changes that need to be merged; they undo any changes they deauthorize.

A sticking point is how to handle accumulation of many deauthorized keys (to allow for frequent key rotation) as time goes on, without growing the feeds array without bound. If these feeds could somehow be intertwined with the key lookup trie such that they are only attached to the keys they have data for, rather than being listed all together as one, I think that would resolve it, but that seems like it would be a later improvement.