libp2p / go-libp2p-kad-dht

A Kademlia DHT implementation on go-libp2p
https://github.com/libp2p/specs/tree/master/kad-dht
MIT License
516 stars 221 forks source link

Implement DHT.PutValue to support ValueStore interface #914

Closed dennis-tra closed 9 months ago

dennis-tra commented 9 months ago

@iand

I just thought about how a PutValue method could look like I think there are a few options. This is what I have so far:


// PutValue satisfies the [routing.Routing] interface and will add the given
// value to the k-closest nodes to keyStr. The parameter keyStr should have the
// format `/$namespace/$binary_id`. Namespace examples are `pk` or `ipns`. To
// identify the closest peers to keyStr, that complete string will be SHA256
// hashed.
func (d *DHT) PutValue(ctx context.Context, keyStr string, value []byte, opts ...routing.Option) error {
    ctx, span := d.tele.Tracer.Start(ctx, "DHT.PutValue")
    defer span.End()

    // first parse the routing options
    rOpt := routing.Options{} // routing config
    if err := rOpt.Apply(opts...); err != nil {
        return fmt.Errorf("apply routing options: %w", err)
    }

    // then always store the given value locally
    if err := d.putValueLocal(ctx, keyStr, value); err != nil {
        return fmt.Errorf("put value locally: %w", err)
    }

    // if the routing system should operate in offline mode, stop here
    if rOpt.Offline {
        return nil
    }

    // construct Kademlia-key. Yes, we hash the complete key string which
    // includes the namespace prefix.
    h := sha256.Sum256([]byte(keyStr))
    kadKey := key.NewKey256(h[:])

    // define the query function that will be called after each request to a
    // remote peer.
    fn := func(ctx context.Context, id kadt.PeerID, resp *pb.Message, stats coord.QueryStats) error {
        return nil
    }

    // finally, find the closest peers to the target key.
    closest, _, err := d.kad.QueryClosest(ctx, kadKey, fn, 20)
    if err != nil {
        return fmt.Errorf("query error: %w", err)
    }

    panic("implement me")
}

To mimic the original behaviour the next step would be to replace panic("implement me") with a loop over the closest peers and store the value with them. However, this requires the DHT to directly use the router and call SendMessage, which is not accessible from there yet. I mean, that's not a problem to add, just do we want that?

The second thought I had was about supporting Optimistic provide in the future. For optimistic provide we would need support starting PUT RPCs while the query is still running. This would be possible with the queryFn that we pass into QueryClosest - but again, is this the preferred way of doing that?

Then, thirdly, what about reprovide sweep. Reprovide sweep uses a different query logic where we're not asking for ever closer peers but use carefully crafted keys to "enumerate" the keyspace. Arguably, this is out of scope of PutValue which only deals with single records. Still worth thinking about a common abstraction.

What about introducing a new Put/Announce/Broadcast/Provide state machine which the user can configure with different put strategies. Initially, the strategies could be classic and optimistic. This state machine would handle starting the query, react to query results, and emit events that instruct the router to put the record with some peer. The classic strategy would wait until the query has identified the 20 closest peers and the optimistic strategy would interleave put with query requests and would generally control the lifetime of the query.

@iand You have a better overview if and how this could fit the current structure. If we wanted to go forward with a new state machine, would this be part of a query pool? A different behaviour? Can two behaviours be notified of the same event (query + put behaviour)?

Edit: Also, SendMessage of the Router is not ready for PUT_VALUE / ADD_PROVIDER RPCs because peers don't respond with anything. SendMessage blocks until it has received a response.

iand commented 9 months ago

To mimic the original behaviour the next step would be to replace panic("implement me") with a loop over the closest peers and store the value with them. However, this requires the DHT to directly use the router and call SendMessage, which is not accessible from there yet. I mean, that's not a problem to add, just do we want that?

That would be simplest, but not necessarily the best. The DHT is closer to libp2p and has more control than the Router type, so it does make sense to do the libp2p interactions at this level. If we push it into the coord package then we have to use the Router (which could be potentially modified to give more control).

The second thought I had was about supporting Optimistic provide in the future. For optimistic provide we would need support starting PUT RPCs while the query is still running. This would be possible with the queryFn that we pass into QueryClosest - but again, is this the preferred way of doing that?

This is how I envisaged doing it initially. The QueryFunc approach would support it.

What about introducing a new Put/Announce/Broadcast/Provide state machine which the user can configure with different put strategies. Initially, the strategies could be classic and optimistic. This state machine would handle starting the query, react to query results, and emit events that instruct the router to put the record with some peer. The classic strategy would wait until the query has identified the 20 closest peers and the optimistic strategy would interleave put with query requests and would generally control the lifetime of the query.

This would be my preferred approach. I think I would structure these as separate state machines with a new "RecordBehaviour" that mediates access to them. The coordinator dispatches messages between different Behaviours and from the outside world.

I think the state machines would use a Query state machine like Bootstrap does and probably only need to use the FindCloser type of query. They would be responsible for the actual messaging to the remote peers based on how the query is progressing.

Edit: Also, SendMessage of the Router is not ready for PUT_VALUE / ADD_PROVIDER RPCs because peers don't respond with anything. SendMessage blocks until it has received a response.

I'm not sure I follow this. For those kinds of interactions the Router implementation can simply return after sending the message. It doesn't have to return a response if there isn't one.

But I am very open to alternatives. The Router interface is a work in progress and its current shape has been driven by basic querying use cases. Maybe rename SendMessage to SendWithResponse and add a separate Send method that doesn't wait for or need a response?