libp2p / go-libp2p-kad-dht

A Kademlia DHT implementation on go-libp2p
https://github.com/libp2p/specs/tree/master/kad-dht
MIT License
517 stars 221 forks source link

Add provider record addresses to peerstore #870

Closed dennis-tra closed 11 months ago

dennis-tra commented 11 months ago

fixes #868

Jorropo commented 11 months ago

What is the point of this ? Identify already takes care of adding addresses of peer we are connected to.


I mean the fact we are even passing the addresses in the add endpoint seems wrong and I doubt we do. If we do this is a bug, we shouldn't send all of our addresses multiplied by all of our CIDs, wastefull use of bandwidth.

dennis-tra commented 11 months ago

The idea is to avoid the second lookup for network addresses after you've found the provider record.

The routed host will still do the lookup for new addresses if it cannot connect to the peer.

Up until now we relied on identify to keep the addresses around. IIRC @cortze's RFM showed improvement potential if we kept them around longer.

I don't think at all that this is a bug or anything. This is just caching.

Jorropo commented 11 months ago

@dennis-tra what does the add provider endpoint has to do with that ?

When I add a provider the remote peer already know my addresses since I'm connected to it. You could use host.Peerstore() to fetch them instead, and be sure the client does not send them over the wire. But also generally you don't need to store addresses on each provider record, you should have two table, one CID → [peerid] oen and one PeerID → [Address] one, in order to avoid storing lots of duplicated data.

Jorropo commented 11 months ago

Ok so, the provider store does indeed not store the addreses https://github.com/libp2p/go-libp2p-kad-dht/blob/2cbe38a252187855061c5fce32c22d8c3af88237/providers/providers_manager.go#L255

I did checked that some addresses are sent over the wire and yes some are:

pi.Addrs: 7
pi.Addrs: 7
pi.Addrs: 7
pi.Addrs: 7
pi.Addrs: 7
pi.Addrs: 7
pi.Addrs: 7
pi.Addrs: 8
pi.Addrs: 7
pi.Addrs: 16
pi.Addrs: 16
pi.Addrs: 16
pi.Addrs: 14
pi.Addrs: 7
pi.Addrs: 11
pi.Addrs: 5
pi.Addrs: 15
pi.Addrs: 12
pi.Addrs: 8
pi.Addrs: 1
pi.Addrs: 7
pi.Addrs: 7
pi.Addrs: 7
pi.Addrs: 7
pi.Addrs: 12
pi.Addrs: 12
pi.Addrs: 14
pi.Addrs: 21
pi.Addrs: 8
pi.Addrs: 26
pi.Addrs: 9
pi.Addrs: 17
pi.Addrs: 12
pi.Addrs: 1
pi.Addrs: 11
pi.Addrs: 4
pi.Addrs: 18
pi.Addrs: 8

However this code is at least buggy because it fails to run the addresses through the address filter (using maybeAddAddrs). That means it saves all the private LAN addresses of the nodes doing PUTs in our peerstore which we want to stop be doing due to the smart dialler.

dennis-tra commented 11 months ago

what does the add provider endpoint has to do with that ?

The add provider endpoint explicitly allows the client to associate addresses with the record.

When I add a provider the remote peer already know my addresses since I'm connected to it.

I agree that, as things currently are, it probably doesn’t matter if we take the addresses from the identify exchange or from the wire message. They probably contain the same set of addresses.

That’s not the point though. The thing that we want to address here is the TTL associated with the multiaddresses of that peer. However, I still think it’s better to take the explicitly passed multiaddresses from the request.

You could use host.Peerstore() to fetch them instead, and be sure the client does not send them over the wire. But also generally you don't need to store addresses on each provider record, you should have two table, one CID → [peerid] oen and one PeerID → [Address] one, in order to avoid storing lots of duplicated data.

Not sure if I’m missing something but that is what we’re doing isn’t it? We’re storing the multihash -> PeerID mapping in a datastore and another mapping from PeerID to multiaddresses in the peerstore. We’re not storing any duplicate multiaddresses.

When a peer requests a record we look up the peer IDs in the datastore and then the multiaddresses in the peerstore. We serve both information to the client. However, we stop serving multiaddresses after 30 min because the TTL is not properly updated. We want to continue serving the multiaddresses for much longer. This PR attempted to fix that.

Jorropo commented 11 months ago

Not sure if I’m missing something but that is what we’re doing isn’t it? We’re storing the multihash -> PeerID mapping in a datastore and another mapping from PeerID to multiaddresses in the peerstore. We’re not storing any duplicate multiaddresses.

Yeah I wrote that before tracing the code. Still over the wire we send theses duplicates which is an ineficient use of resources.

I agree that, as things currently are, it probably doesn’t matter if we take the addresses from the identify exchange or from the wire message. They probably contain the same set of addresses.

That’s not the point though. The thing that we want to address here is the TTL associated with the multiaddresses of that peer. However, I still think it’s better to take the explicitly passed multiaddresses from the request.

We don't need to send the addresses over kad-dht to update the TTL.

IMO This fix just makes thing worst because it incentivize clients to do the wrong thing and waste bandwidth, before it was a bug and we did nothing with theses addresses now you turned it in a feature :sparkles:. You could call https://pkg.go.dev/github.com/libp2p/go-libp2p@v0.30.0/core/peerstore#AddrBook.UpdateAddrs, sadly it doesn't have a wildcard update, I think you could submit a PR to libp2p which add some magic wellknown TTL like 0 match all addresses already stored. If you don't want to submit code to go-libp2p you could do:

// PseudoCode-Ish
addrs := dht.pstore.Addrs(p)
if f := dht.addrsFilter; f != nil {
 addrs = dht.addrsFilter(addrs)
}
pstore.AddAddrs(p, addrs, dht.providerTTL)

They both achieve the same thing except this allows us to stop sending maddrs along side add provider.


Edit: the pseudo code that fetch run through the filter and add the addresses back-in is better IMO, because it only increase the cache on the addresses we want to cache.

Jorropo commented 11 months ago
if len(pi.Addrs) < 1 {
    logger.Debugw("no valid addresses for provider", "from", p)
    continue
}

I actually this is on purpose :facepalm:, forget what I said. We should eventually make a protocol upgrade with versioned stream handler so we can skip doing this. This is a multiple X background traffic saving.

Edit: new issue here https://github.com/libp2p/go-libp2p-kad-dht/issues/871