ipfs / go-ds-crdt

A distributed go-datastore implementation using Merkle-CRDTs.
Other
391 stars 42 forks source link

Feature - Authorization #226

Closed asynxc closed 5 months ago

asynxc commented 6 months ago

Hello @hsanjuan, great work on this library and it's simplistic, minimalistic API and also works great!

I'm exploring it for a potential integration into a p2p network where peers share a kv storage. In a sub network, peers must be authorized at the Key level in order to PUT, GET or DELETE from the datastore. So some things are missing and i may need you input 🙏

What's the proper way to add autorisation layer on top of go-ds-crdt ? Is signing and wrapping pb.Delta to add sender identity, and authz token, also wrapping the DAGService to do the proto wrapping and somehow allow/disallow Add/GET/GetMany bitswap requests based on the identity, token and signature would work ? I'm trying to find a way without involving modiying go-ds-crdt ..

welcome[bot] commented 6 months ago

Thank you for submitting your first issue to this repository! A maintainer will be here shortly to triage and review. In the meantime, please double-check that you have provided all the necessary information to make this process easy! Any information that can help save additional round trips is useful! We currently aim to give initial feedback within two business days. If this does not happen, feel free to leave a comment. Please keep an eye on how this issue will be labeled, as labels give an overview of priorities, assignments and additional actions requested by the maintainers:

Finally, remember to use https://discuss.ipfs.io if you just need general support.

asynxc commented 6 months ago

Update: i have tried implementing the DAGSyncer wrapper to add authz layer, and it worked great. the wrapper inspects pb.Delta elements/tombstones and simply remove the ones that're not allowed. to make sure that the DAGs heads are equal among all peers, the modification to pb.Delta/dag.Node do not change it's CID. this approach allow each peer to apply its own authz policies and maintain same heads..

The inspection and modification of pb.Delta requires marshal/unmarshal of the dag.Node.Data() twice, at the DAGSyncer wrapper level and go-ds-crdt. would you accept a new PR to add a marshaler option ? @hsanjuan This may also be beneficial for other use case, such using more performant data marshalers (ex: vtprotobuf) or add more data into the exchanged payload...

hsanjuan commented 6 months ago

Hello,

in ipfs-cluster, authorization is done with the Broadcaster (pubsub). Pubsub messages are signed by the sender so any crdt-peer can directly drop messages coming from untrusted peers, before any unmarshalling of the deltas:

    // Validate pubsub messages for our topic (only accept
    // from trusted sources)
    err = css.pubsub.RegisterTopicValidator(
        topicName,
        func(ctx context.Context, _ peer.ID, msg *pubsub.Message) bool {
            signer := msg.GetFrom()
            trusted := css.IsTrustedPeer(ctx, signer)
            if !trusted {
                logger.Debug("discarded pubsub message from non trusted source %s ", signer)
            }
            return trusted
        },
    )
    if err != nil {
        logger.Errorf("error registering topic validator: %s", err)
    }

    broadcaster, err := crdt.NewPubSubBroadcaster(
        css.ctx,
        css.pubsub,
        topicName, // subscription name
    )

IIRC that would perhaps fit your bill. Note this does not prevent untrusted peers from seeing/downloading what goes on the database, only prevents them from updating it.

Regarding how to prevent untrusted peers from even getting deltas, I would possibly do it with peer filters at the libp2p Host layer. The ConnManager has connection-gating options to filter out peer ids for example: https://pkg.go.dev/github.com/libp2p/go-libp2p@v0.32.2/core/connmgr#ConnectionGater

the wrapper inspects pb.Delta elements/tombstones and simply remove the ones that're not allowed. to make sure that the DAGs heads are equal among all peers, the modification to pb.Delta/dag.Node do not change it's CID

Can you give more details? I don't fully understand how attaching more information to the Delta doesn't change the CID. By mutating the crdt-node the CID must change.

asynxc commented 6 months ago

Thanks for the feedback, appreciate it @hsanjuan

The pubsub and peers filter guard (or even a pnet) are good solutions to protect non trusted peers. What i'm trying to implement is a step further into access control layer where peers, against some rules are given access to mutate (PUT, DELETE) only a subset of sub/keys of the datastore... those ACL rules can be shared via distributed tokens (jwt, biscuit or other...). on top of that, any peers can also augment those token rules to apply its own by deriving the token, but this's a very specific use case... the idea here, is that each peer control its own view of the datastore by allowing only subset of Delta.Elements/Tombstones to be applied.

For that to work, some conditions must be met:

Modifying the dag.Node (to filter-out Delta.Elements/Tombstones) will automatically change its CID, which will risk to diverge the current peer DAG from other peers. to prevent this, i swap the cid builder of the dag.Node by a dummy custom one that provide same CID of the original dag.Node. this's only to ensure condition (2) is met.

I have tested the implementation with some load and it works great. Interested in your feedback though in case i have missed some corner cases.

For now, i think one thing i need to fix for this to be usable performance-wise, actually i double marshal/unmarshal the dag.Node. hence my proposal to expose an option that give the user a control of how dag.Node are marshaled ^^

One other proposal, the Broadcaster interface methods accepts []byte (proto marshaled CRDTBroadcast), this's ok until a user needs to add more infos to published/received messages (ex: a token). those infos can be used to accept/deny requests by the receiver. to avoid unnecessary parsing would it be ok to delegate the marshaling/unmarshaling task to the broadcaster implementation, this's ofcourse api breaking changes.

The proposed Broadcaster interface would be

// A Broadcaster provides a way to send (notify) an opaque payload to
// all replicas and to retrieve payloads broadcasted.
type Broadcaster interface {
    // Send payload to other replicas.
    Broadcast([]**cid.Cid**) error
    // Obtain the next payload received from the network.
    Next() ([]**cid.Cid**, error)
}

Sorry for giving you too much to read ! to summarise:

If it's ok with you, i can make PRs

Thanks again. 🙏

hsanjuan commented 6 months ago

to mutate (PUT, DELETE) only a subset of sub/keys of the datastore...

ah ok, I had misunderstood.

i swap the cid builder of the dag.Node by a dummy custom one that provide same CID of the original dag.Node. this's only to ensure condition (2) is met.

I have tested the implementation with some load and it works great. Interested in your feedback though in case i have missed some corner cases.

You mean that you are using a custom CID prefix associated to this builder?

github-actions[bot] commented 5 months ago

Oops, seems like we needed more information for this issue, please comment with more details or this issue will be closed in 7 days.

github-actions[bot] commented 5 months ago

This issue was closed because it is missing author input.