pgte commented 6 years ago

Work in Progress: This is a work in progress. For comments and suggestions contact us at research@protocol.ai.

We as Protocol Labs actively support these areas of research with grants, bounties and direct collaborations. We plan to fund research related to these open problems. Reach out if you want to work on or are working on these problems.

Data laced with permissions: Decentralised Access Control in CRDTs

Context

Eventual consistency

Unlike Strongly Consistent systems, Eventually Consistent (EC) systems don't require synchronization between peers in order to modify data. Changes can be done locally or to a small number of nodes and then asynchronously replicated to others, eventually reaching them. These systems are more resistant to network partitions, and are thus better suited to being used in decentralized environments where connectivity can be low. The drawback is that, in a given point in time, data is not guaranteed to be synchronized between peers.

Strong Eventual Consistency

Strong Eventual Consistency (SEC) is a stronger type of EC where some properties are guaranteed: when all the replicas have received all the messages independently of their order, they are guaranteed to reach the same state.

Under these constraints, data is still not guaranteed to be equal in all replicas in a given point in time, but data is guaranteed to eventually converge and to be monotonic (a replica never undoes a change). Conflict-free Replicated Data Types are a data type that guarantees Strong Eventual Consistency.

Conflict-Free Replicated Data Types

Conflict-Free Replicated Data Types (CRDTs) are a class of data types that provides strong eventual consistency guarantees, and which has several interesting properties:

Changes are local: All changes are performed locally, without the need to contact other replicas to guarantee a write. The synchronization is done asynchronously, when and if there is network connectivity.
Determinism: No matter what the order of delivery of messages, replicas are guaranteed to converge. This is particularly suitable in poorly connected networks, while also supporting offline editing.

Problem definition

Intuition

Using a CRDT, users can collaborate in real-time editing of a shared document over a loosely connected P2P network. Each user will have a replica of the shared document and they will send each other messages signifying updates to the state (either in the form of operations, whole states or deltas).

While CRDTs already make this possible, certain useful capabilities do not exist yet. For example, the owner of a document might want to collaborate with other uses, but still be able to maintain control of who can read or write to this document.

Not only does the user want to protect their local replica but, in order to maintain consistency across peers, they also want to inform the other replicas of which users are allowed to do what.

Besides having the power to adding permissions to users in all of the replicas, they might want to be able to remove these permissions and be able to also propagate this to the other replicas.

Besides read and write access, this user may also want to delegate this power to other users, so that they also can become administrators of permissions for this given replica. Any of these permissions should also be revocable in the future.

The initial owner will create a message describing this permission and propagate it to all the other users. When knowing this information, replica of user B will start making local changes, and those changes will be propagated to other replicas. If, while user B is propagating the replica, user A revokes this permission, how can this be accomplished in a way that is secure and that still guarantees SEC?

Definitions

Access Control List

An Access Control List (ACL) is a type of data containing a list of permissions attached to an object. An ACL specifies which users are granted access to an object (in this case, a CRDT instance) and what operations are allowed in this object. For simplicity, we define the following operations:

read (can follow the CRDT but cannot modify it)
write (can modify a CRDT)
admin (can assign and revoke permissions to other users)

A user that has admin permission at a given time can, at that time, change the entire ACL by granting or revoking permissions.

Further details of the ACL model are not specified here to increase the flexibility of the system constraints should it be needed.

Identity: authenticating peers and users

We can assume that each peer has an immutable identifier and a public key. From a peer identifier, any other peer can retrieve the respective public key. A peer has a secret private key that can be used to create message signatures, which, together with the peer id, can be used to authenticate message authors.

A user has an identity that is somehow attached to the peer identifier in a cryptographically secure way, which means that, if we authenticate a peer identifier, we can assume that we also authenticate the linked user identifier.

Replica

If a peer / user participates in a CRDT instance at a given time, this peer has a local replica of that CRDT instance.

Hard constraints

Keep strong eventual consistency

The system should have strong eventual consistency properties under any circumstances. For instance, when an ACL instance is applied to a CRDT instance, it should not compromise convergence on any replica. If, for example, operations authored by a peer whose permission gets revoked while other peers are in different stages of integrating these operations, every replica that has read permissions should still eventually converge to the same state.

Have typical ACL properties

This ACL type has the following obvious security properties, including unforgeability, integrity, non-repudiation, be impervious to man-in-the-middle attacks, repetition attacks and others.

Privacy

By looking at the wire data transmitted, the knowledge of the ACL state should not be known to other peers outside the ones participating in the CRDT instance.

Safety

When applied to a CRDT, the ACL type should provably provide Strong Eventual Consistency as defined in Conflict-free replicated data types.

Soft Constraints

Meta-data privacy

By looking at the wire data being transmitted, an attacker should not be able to infer relationships between peers and CRDT instances.

Open problems

Is there an ACL type - which describes the capabilities of replicas to access and manipulate a CRDT — that, while having the desired security properties when applied at replicas, also guarantees strong eventual consistency of this CRDT?

More specifically:

Is there such a replicated strongly-consistent ACL type?
Is there such a replicated eventually-consistent ACL type?

Definitions

Security

An ACL type should provide:

Privacy: A network node not belonging to the replica set should not know which other replicas are participating in the CRDTs in which roles.
Authentication: A replica should be able to cryptographically prove the authorship of a permission and revocation.
Impervious to MITM, repetition, spoofing, DoS, Sybil and Eclipse attacks.

Safety

When applied to a CRDT, the ACL type should provably provide Strong Eventual Consistency as defined in Conflict-free replicated data types.

bieniusa commented 6 years ago

Hi,

our research team at TU Kaiserslautern has been working on Access Control CRDTs for some time in the context of AntidoteDB, a causally consistent transactional CRDT data store. Here some publications on how to model, formalise and implement Access Control for such weakly consistent data stores : https://arxiv.org/abs/1801.07005 https://link.springer.com/chapter/10.1007%2F978-3-319-46598-2_6

We would be interested in collaboration - just ping me!

amark commented 6 years ago

@pgte we have a lot of these things already implemented (and many pieces already figured out, but not implemented yet) with GUN. I was at the decentralized web summit meetup today, where Kyle and others mentioned I should post here.

We also have some apps (like decentralized Reddit) that have pushed a half terabyte of traffic in 1 day using the system. We're very close to a v1.0!

pgte commented 6 years ago

@bieniusa thank you for getting in touch! We're very aware of your work :) We've opened an RFP for this (and other topics). If you're interested maybe check it out: https://github.com/protocol/research-RFPs/blob/master/RFPs/rfp-4-CRDT-ACL.md

pgte commented 6 years ago

@amark thanks cool! Could you provide a pointer for how / if you do eventually consistency and access control?

bieniusa commented 6 years ago

@pgte This RFP looks like a fantastic opportunity! I have a few questions related to the administration of the grant scheme. Whom should I contact regarding this? Also, what is the process for refining the problem description?

amark commented 6 years ago

@pgte conceptual overview (explaining the cryptographic primitives in general, for laypeople, not the model, but an important starting point) here:

https://gun.eco/explainers/data/security.html

Here is an example of a P2P/decentralized LinkedIn demo to show case cryptographic ACL-read examples:

https://youtu.be/ZiELAFqNSLQ

And there are some other examples and dApps being built in our community with it already. I usually code first rather than writing long/boring articles - I'm happy to do a Skype/whatever with the team (maybe also Kyle and David, since I've been chatting with them already? I know Juan from back in the day, but I know he's too busy) to explain more.

pgte commented 6 years ago

@amark the videos don't tell me much.. Could you then point me to the code that handles this? I'm very interested in knowing how the community and particular projects are solving these problems, so I can list them in the solutions discussion.

amark commented 6 years ago

@pgte they show that it is possible and working, which is an important distinction from theory or papers (relevant because there are a lot of blockchain "whitepapers" floating around making a lot of claims, but have very little proof or code of it working).

The code is not going to be helpful, it is thousand+ lines of cryptographic wrappers over WebCrypto's nasty API, which is exposed nicely in the SEA docs a primitive library. This same API will let us shim other cryptography libraries or proxy it out of the browser and to a browser-extension/installed-app (so that way user's keys are never exposed in JS), but we haven't built that yet.

Then you save the results of the SEA commands to GUN, which handles the sync and conflict resolution. For instance end-to-end encrypted private messages that d.tube (with a polished version, including username search, notifications, etc.) will be releasing coming up. Or account management including 0-server (fully local) forgot-password/password-reset.

Many of these use case-specific techniques for P2P ACL (for instance, private messaging is pretty easy/straight-forward code that anybody could architect, but LinkedIn is harder/more-complicated/nuanced), that we will are abstracting into a generalizable interface.

The gist of one of the techniques (the most powerful, but most work/brute-force involved, although far far far less than append-only/log based methods) is to use our state-based CRDT algorithm that merges data on nodes in a graph at one more level, merging 2+ user's graphs together to get the final object. This object is fully collaborative and secure, yet at any moment you can revoke write-authority to a user, and their edits vanish and thus reveal underneath whoever else's most recent edit (non-destructive graph intersections!).

I could go on for hours, but need to code rather than talk. Happy to do a call with the team, as it would save a lot of time, and get some official partnership going between you guys and us (one of the other people on our team is actually Martti 'Sirius' Malmi, Satoshi's 1st contributor to Bitcoin, he is a genius of far more compelling and harder subjects, like handling decentralized identity verification, etc. for this stuff).

bieniusa commented 6 years ago

@amark Thank you for sharing these insights; the implementation of the cryptographic parts seems quite impressive. Let me just point out that our paper - while being neither long nor boring, but peer-reviewed and machine-proof checked ;) - does an important step that all other works in the area that we are aware of seem to skip: We define actual semantics of access control in weakly consistent settings. Despite CRDTs and convergence, it is far from obvious how to interpret ACLs given concurrent updates. For example, it is hard to interpret notions like "at any moment you can revoke write-authority" in a replicated setting subject to network partitions, dropped messages, etc. I would be happy to learn what semantics you are envisioning!

pgte commented 6 years ago

@bieniusa Fantastic, thank you for your interest! You can contact us in research@protocol.ai. To refine the problem description, you can comment on that, or even create a pull request or, if you prefer, privately to the same email address.

amark commented 6 years ago

@bieniusa machine-proof checked!? That is awesome!! TLA+ or something? I'd love to learn more. Thank you for doing that, that is top notch.

It is strongly Eventually Consistent and why we built panic, a distributed testing framework with correctness verification and load simulation. You can run our split-brain test via mkdir foo && cd foo && mkdir node_modules && npm install gun && cd node_modules/gun && npm install . && mocha test/panic/holy-grail.js.

The technique I mentioned above is always subject to the P2P/decentralized entity's view, meaning that the security itself is kinda like Multi-Worlds Theorem. Alice can have (Bob, Carl) as collaborators, while Bob (on the exact same data set) can have (Alice, Dave), and Dave might have the (Alice, Bob, Carl). Therefore, revocation is instant even while offline. Updates are not network or order dependent (since they use a CRDT), but obviously require connectivity to sync.

Another model, is where a data-owner says who the collaborators are, and peers use that instead. In this case, revocation is not instant, the data-owner's update needs to sync through a network before other peers receive and apply it, however updates are based on the same CRDT and therefore offline-first, order-independent, and strongly Eventually Consistent.

There are several other models as well.

miyazono commented 6 years ago

RFP link: https://github.com/protocol/research-RFPs/blob/master/RFPs/rfp-4-CRDT-ACL.md

drew-512 commented 5 years ago

Hi!

I'm proud to finally be able to share that my team and I have submitted for this RFP -- I emailed both research@ and rfp@.

We had a partially-conceived project for some time, but when we saw RFP-4 and an alignment with Protocol Labs' ethos and vision, it felt right to trust in taking the project in this direction and make it architecture complete so we could present it fully.

Please let me know if you're not seeing the email.

Best, Drew O'Meara PLAN Systems (a Texas non-profit, 501c3 pending)

jsoares commented 5 years ago

@drew-512 Email received, reply sent. We'll review and follow-up. Thanks for submitting!

silvianetobessa commented 2 years ago

Hi all, thank you for your comments 🚀 We are now closing the issue, feel free to reopen it in the future if you want to restart the conversation on this topic.

protocol / research

Decentralised Access Control in CRDTs #8

Data laced with permissions: Decentralised Access Control in CRDTs

Context

Eventual consistency

Strong Eventual Consistency

Conflict-Free Replicated Data Types

Problem definition

Intuition

Definitions

Access Control List

Identity: authenticating peers and users

Replica

Hard constraints

Keep strong eventual consistency

Have typical ACL properties

Privacy

Safety

Soft Constraints

Meta-data privacy

Open problems

Related reads

Definitions

Security

Safety