libp2p / go-libp2p-kad-dht

A Kademlia DHT implementation on go-libp2p
https://github.com/libp2p/specs/tree/master/kad-dht
MIT License
524 stars 224 forks source link

Persist routing table #254

Open anacrolix opened 5 years ago

anacrolix commented 5 years ago

Standard DHT implementations don't require bootstrapping on every instance start-up. It should be possible to import previous instance routing tables, and avoid having to bootstrap from a set of bootstrap nodes. Note that regularly refreshing the routing table is still common (and may be called bootstrapping).

raulk commented 5 years ago

Thoughts.

Persistence medium

Seeding strategy

We need to decide how we're going to seed the routing table on start, because it'll affect what metadata we persist about peers in the routing table, and hence the data model.

Brainstorming options:

Stebalien commented 5 years ago

I think we should (a) keep the address information in the peer store and (b) persist a single blob containing the routing table (peer IDs only). Really, we can probably just persist the entire routing table and reconnect to the entire routing table on start, no need to score. Thoughts?

raulk commented 5 years ago

go-ethereum seeds with a randomised sample instead of rematerializing the table verbatim, as a way to recover from borked tables. I think it’s a sensible strategy, it’s more secure, and it works.

Stebalien commented 5 years ago

Randomized sample from the routing table? Connecting to a subset shouldn't be any more secure than connecting to all, unless I'm missing something.

raulk commented 5 years ago

In a scenario where your routing table is borked (e.g. all nodes keep their TCP connections alive but they ignore queries) and you restart, if you materialise the same table you will be stuck again. That’s why it’s more secure to seed a subset, and perform a random walk from those, to fill empty slots with fresh nodes.

Stebalien commented 5 years ago

Won't that subset just ignore queries?

raulk commented 5 years ago

@Stebalien you’re right. I don’t know, it feels wrong not to introduce a randomisation component over restarts, maybe mixing in well-known/reputable nodes as well during seeding (for now, our bootstrappers; later, nodes we’ve calculated a high reputation for over time). Randomisation is a good starting point to defend against some attacks in a Byzantine scenario.

raulk commented 5 years ago

Basically with a mix of randomised past nodes + a few bootstrappers (to keep it simple) we should be able to recover from a borked table after a restart, albeit slowly, but without manual intervention.

Stebalien commented 5 years ago

Mixing in some bootstrappers sounds like a good idea.

da2x commented 5 years ago

From looking at BitTorrent client implementations; the only criterion for storing a node for future use is this: you’ve successfully established an outgoing connection to it (it’s reachable.)

On the next startup, a number of nodes are selected at random (e.g. 10). If any of them are unreachable, replace it with another randomly selected node. If you run out of nodes, fall back to the bootstrapping nodes.

If you want to be smart about it then you can try selecting some nodes that have a short prefix length from your own public IP and some nodes from a diverse set of subnets. However, I do believe blindly choosing nodes at random is a perfectly valid strategy. Nodes are expected to leave and join the network all the time so it’s probably just a waste of time to try to carefully select the perfect set of nodes.

Mikaela commented 5 years ago

As Yggdrasil and Cjdns user it would be nice if I automatically got peers from 0200::/7 and fc00::/8 so not all my traffic would go through clearnet that way.

If I was always given ten random clearnet peers instead of ever using bootstrap peers, would I get peers from those networks at all?

yangm97 commented 5 years ago

@Mikaela afaik, atm, one way to do that is if the application talks to the cjdns/yggdrasil API directly, to fetch nodes.

But assuming you're not peering over clearnet and nodes around you are cjdns/ygg native, you could discover them by ipv6 loopback multicast, and then switch to cjdns/ygg for data exchange, once you discover the ips these nodes are advertising (and save them because ygg/cjdns ips don't change when nodes move around).

anacrolix commented 5 years ago

From looking at BitTorrent client implementations; the only criterion for storing a node for future use is this: you’ve successfully established an outgoing connection to it (it’s reachable.)

Right on. You just store your routing table wholesale.

On the next startup, a number of nodes are selected at random (e.g. 10). If any of them are unreachable, replace it with another randomly selected node. If you run out of nodes, fall back to the bootstrapping nodes.

Again, essentially standard routing table treatment. Your point about falling back to bootstrap nodes is the missing link in this implementation. Anytime the routing table becomes unviable (or a query stalls due to that), a callback that returns bootstrap nodes can be invoked to get things rolling again, like StartingNodes here.

If you want to be smart about it then you can try selecting some nodes that have a short prefix length from your own public IP and some nodes from a diverse set of subnets. However, I do believe blindly choosing nodes at random is a perfectly valid strategy. Nodes are expected to leave and join the network all the time so it’s probably just a waste of time to try to carefully select the perfect set of nodes.

I agree. It's sufficient to just use your routing table management policy at all times, both before and after persisting. By definition the routing table is the best available peers, appropriately distributed throughout the keyspace at all times anyway.

raulk commented 5 years ago

Let's try not to hardcode any behaviour, and let's make these strategies modular. That's the ethos of libp2p.

Unlike a specific BitTorrent client, etc. libp2p is a library for multitude of use cases, and no one size fits all.

I'm thinking of two interfaces you can inject via Options:

type Loader interface {
         // Load receives a snapshots of the members of the previous routing table
         // (possibly 0-length), and the peerstore (so it can verify whether we
         // still hold addresses for past members). 
         // 
         // It selects the candidates to add to the routing table, and it may use
         // any heuristic, such as past observed latencies, failure rate, 
         // helpfulness score, etc. (recorded as metadata in the peerstore), 
         Load(previous []peer.ID, pstore pstore.Peerstore) (candidates []peer.ID)
}
type Seeder interface {
        // Seed receives the routing table to populate, the candidates the Loader 
        // picked, and the fallback peers (static bootstrap peers).
        //
        // Seed must ensure connectivity to any peers it adds, and in that process
        // it can use metrics such as current, real latency to filter peers. 
        //
        // In the event that candidates are unviable, it can fall back to the 
        // static bootstrap peers to salvage the situation.
        Seed(host libp2p.Host, table kb.RoutingTable, candidates, fallback []peer.ID)
}

We should have default strategies for when the user doesn't pass these options in the constructor, like the ones that have been cited.

We can also define common strategies in exported vars, so that the user can combine them as if they were recipes.

anacrolix commented 5 years ago

After creating a node, it should be possible to seed the routing table with entries. It's already possible to access the routing table of a node, we only need to determine appropriate values to be reconstituted with a peer's entry, and expose that. That would include the peer ID, and its known addresses if they're not handled externally. Adding nodes to the routing table directly via a method on the DHT should only be required if the routing table does not have sufficient context to receive the above type.

Correspondingly, the DHT user should be able to retrieve a snapshot of the routing table before it is closed, or on closing, to persist as they see fit. Again, the only question here is what data is required from the DHT to persist a routing table entry and restore it later. Deferring address management and persistent to the peerstore, or whichever component is responsible for addresses makes things a little less convenient, but possibly more flexible.

anacrolix commented 5 years ago

I think #283 should be done first, as it may affect what information needs to be persisted.

anacrolix commented 5 years ago

283 also makes reconstituting a routing table more robust. Currently if we did this, we'd have to connect to each peer added to maintain the routing table invariant that all entries are connected. Which also highlights how flawed that assumption is: If we can't connect to any peers while adding to the routing table due to intermittent connectivity issues, we can't restore the routing table.

anacrolix commented 5 years ago

I'm suggesting, more or less:

IpfsDHT.SetRoutingTable([]PeerInfo)
IpfsDHT.GetRoutingTable() []PeerInfo

All other behaviours at the whim of the consumer.

hsanjuan commented 5 years ago

Note that the DHT may not be the only service that would benefit from remembering peers and reconnecting to them. I kind of see that as a separate, non-dht issue that should be handled by the libp2p host directly.

1) Peerstore should be persisted (host option) 2) Hosts should have an option specifying what to do with known peers in the peerstore on boot (ranging between connecting to every peer, or connecting to N peers, or none). Probably as part of a host.Bootstrap() method that can be called when all services have been added since they depend on connect notifications. This allows the DHT to bootstrap itself from scratch if necessary, but also other services. 3) Then the DHT may choose to persist its own routing table, and connect to those peers as well, but regardless of what the general host bootstrap process is.

tl;dr: Don't make peerstore persistence a dht-only feature.

Stebalien commented 4 years ago

Juan brought up that simply persisting the routing table and reviving from that could be dangerous. If an attacker can ever "take over" a peer's routing table and control their view of the network, recovering from this situation would be impossible without using the centralized bootstrappers.

The proposed solution is to look at historical data. That is, we should bootstrap off of peers that we've seen repeatedly over a long period of time.

A possible solution (probably not optimal) is to keep bucketed counts of how frequently we've seen and used each peer over time:

We'd prefer peers that:

This solution is probably quite a bit more complicated than it needs to be, but I hope it illustrates what I think we need.

master255 commented 11 months ago

Temporary solution:

var peersAddrs []peer.AddrInfo
    var i int8
    for _, peerId := range m.host.Network().Peerstore().PeersWithAddrs() {
        if m.host.ID() == peerId {
            continue
        }
        peersAddrs = append(peersAddrs, m.host.Network().Peerstore().PeerInfo(peerId))
        i++
        if i > 20 {
            break
        }
    }
    bytes, _ := json.Marshal(peersAddrs)
....

var peersAddrs []peer.AddrInfo
        err := json.Unmarshal(bytes, &peersAddrs)
....

And I found that Network().Peerstore().PeersWithAddrs() contains more information about peers than kademlia.RoutingTable().ListPeers(). So it is better to use it.