Load-balanced bootstrapping

petar commented 4 years ago

The current bootstrap protocol:

overloads the bootstrappers, and
skews the topology of the network

To remedy this, a new bootstrap protocol should go as follows:

Bootstrapping becomes a dedicated network call, supported by DHT nodes
In response to a bootstrap request, the peer returns K random peers from the network
The node that is joining inserts these random contacts in their RT and proceeds to fill all of its tables using the random peer seeds

To load-balance (and increase speed), bootstrappers can pre-lookup random peers in the network in their free time (on a separate thread), so that they can respond immediately to step 2 (above).

Furthermore, if bootstrappers accumulate a set of N >> K random peers in the network, they can respond to bootstrap queries (step 2, above) by picking a random subset of size K from the accumulated N.

Stebalien commented 4 years ago

Bootstrapping becomes a dedicated network call, supported by DHT nodes

I'd use a different protocol string for this protocol instead of adding this to the DHT protocol. Otherwise, we'd have to have some form of "don't add the bootstrapper to your routing table" exception. If we do that and different users pick different bootstrappers, we're going to have some pretty weird behavior.

Other concerns:

We need to make sure to return good peers. Ideally, long-lived peers. The nice thing about the current approach is that the bootstrapper will return peers in the routing table, which should eventually stabilize.
I'd prefer to just use a general-purpose peer exchange protocol (give me peers that speak protocol X, give me peers that have advertised Y, etc.). But this may be tricky given that we should try to return "good" peers.

aarshkshah1992 commented 4 years ago

@petar

In response to a bootstrap request, the peer returns K random peers from the network.

K random peers that support the DHT protocol, right ? So nodes can then run queries against them to fill their Routing Tables.

aarshkshah1992 commented 4 years ago

@Stebalien

I'd use a different protocol string for this protocol instead of adding this to the DHT protocol. Otherwise, we'd have to have some form of "don't add the bootstrapper to your routing table" exception.

Please can you explain this ? Shouldn't even the DHT peers remove the bootstrappers from their RT after they get a reply from the bootstrapping node as mentioned in step 2 ?

Stebalien commented 4 years ago

You're probably right. If the DHT knows which peers are bootstrappers, it's easy. If it doesn't, we'll have to have some heuristic where we take the first several nodes we see and use the bootstrap protocol.

aarshkshah1992 commented 4 years ago

Hey @Stebalien @petar

Assigning this too myself to get the discussions rolling on this one and come up with a writeup/proposal. Let me know if there are any concerns.

RubenKelevra commented 4 years ago

Shouldn't bootstrapping be considered a one-time-operation after installation?

I mean, since we remove the "behind nat" peers from the DHT we should end up with much more stable peer tables.

So maybe we should try to recover from a 'downtime' with say, the best 20% of the known peers, and if this fails return to the known bootstrap nodes.

This, of course, makes only sense, if the node has run a minimum amount of time, like 1 or 2 hours (after installation).

To make sure that we don't run into net-splits, we might want to query from time to time one of the bootstrapping nodes, to get some "fresh" & known good peers.

I'd prefer to just use a general-purpose peer exchange protocol (give me peers that speak protocol X, give me peers that have advertised Y, etc.). But this may be tricky given that we should try to return "good" peers.

This sounds like a good solution.

For the bootstrapping requests, we should consider saving the peer which provided those peers as a set and rate those sets depending on how they work for us for queries in the long run. (like proposed for the routing responses in #589 )

This way we can greylist peers that supply bad peers to us - so we don't use them as bootstrapping nodes.

aschmahmann commented 4 years ago

Shouldn't bootstrapping be considered a one-time-operation after installation?

Yes, and that relates to #460. This issue is more about how we do that one-time operation (although it'll be a once per reboot operation until #460, or some reworked version of it lands).

RubenKelevra commented 4 years ago

To make sure that we don't run into net-splits, we might want to query from time to time one of the bootstrapping nodes, to get some "fresh" & known good peers.

I just notice, that this idea is inherently flawed: It allows to identify peers very easily in normal operating conditions, because of reoccurring traffic patterns. If this is done, we should use the relay function for these queries, this hides what we're doing for everybody unable to read the encryption.

Shouldn't bootstrapping be considered a one-time-operation after installation?

Yes, and that relates to #460. This issue is more about how we do that one-time operation (although it'll be a once per reboot operation until #460, or some reworked version of it lands).

I think I understand the initial ticket differently, since it makes no sense to request random DHT nodes if we have never bootstrapped - we don't know anybody except the hardcoded bootstrap nodes.

Edit:

On the other hand, a simple Web of Trust could help to avoid the bootstrap situation entirely - a new user might have a friend who he trusts. So he can add him to a list of trusted nodes, and bootstrap over him.

The more trusted nodes a user has, the more independent he is from the bootstrapping nodes.

aschmahmann commented 4 years ago

I think I understand the initial ticket differently, since it makes no sense to request random DHT nodes if we have never bootstrapped - we don't know anybody except the hardcoded bootstrap nodes.

I think you are misunderstanding the issue. Currently when we ask bootstrappers for peers we do so by issuing standard queries towards areas of our routing table (e.g. help me find node A (that happens to have 0 bits in common with me in Kademlia space), help me find B (which has 1 bit in common), etc.). The bootstrapper is returning us peers from its routing table. If many users are hitting the bootstrap nodes that inherently skews the load onto not just the bootstrappers, but the peers of the bootstrappers. Additionally, this can cause routing tables to not have some of the randomness properties that the DHT's health relies on.

As a result, the proposal is that instead of issuing bootstrappers 10+ queries (do you know A, B, C, D,... who have 0...9 bits in common with me) we issue the bootstrapper one query "please give me a random subset of peers in the network, they do not even need to be in your routing table". Note: this has nothing to do with how much you trust the bootstrapper, it's just about the type of question you are asking the bootstrapper which is amplified by the fact that the network tends to work with a small number of bootstrappers.

I just notice, that this idea is inherently flawed: It allows to identify peers very easily in normal operating conditions, because of reoccurring traffic patterns. If this is done, we should use the relay function for these queries, this hides what we're doing for everybody unable to read the encryption.

I don't understand the argument here. Is this about whether an adversary can use traffic analysis to detect if we're a libp2p node? I don't think a relay is going to protect you here since the adversary could also notice traffic spikes from you to the relay. While you could try and bundle more and more of your requests through the relay for traffic obfuscation that just adds cost (bandwidth + machine resources for the relay, plus latency for you), but still ends up being susceptible to traffic analysis.

bertrandfalguiere commented 4 years ago

@RubenKelevra The idea of Web of Trust is already essentially implemented. You can already add a multiaddrs of a peer you trust and bootstrap from it. :)

RubenKelevra commented 4 years ago

I just notice, that this idea is inherently flawed: It allows to identify peers very easily in normal operating conditions, because of reoccurring traffic patterns. If this is done, we should use the relay function for these queries, this hides what we're doing for everybody unable to read the encryption.

I don't understand the argument here. Is this about whether an adversary can use traffic analysis to detect if we're a libp2p node? I don't think a relay is going to protect you here since the adversary could also notice traffic spikes from you to the relay. While you could try and bundle more and more of your requests through the relay for traffic obfuscation that just adds cost (bandwidth + machine resources for the relay, plus latency for you), but still ends up being susceptible to traffic analysis.

Well yeah, we currently rely on a list of relays for this function. But when we return to a "every node in the network can offer relay services" we could just use a random peer we are connected to, to relay our requests to the bootstrap nodes.

Is my assumption correct, that if we bootstrap just once and use the best performing peers from our last session we end up risking netsplits, since there's no common element we connect to?

I think I understand the initial ticket differently, since it makes no sense to request random DHT nodes if we have never bootstrapped - we don't know anybody except the hardcoded bootstrap nodes.

I think you are misunderstanding the issue. Currently when we ask bootstrappers for peers we do so by issuing standard queries towards areas of our routing table (e.g. help me find node A (that happens to have 0 bits in common with me in Kademlia space), help me find B (which has 1 bit in common), etc.). The bootstrapper is returning us peers from its routing table. If many users are hitting the bootstrap nodes that inherently skews the load onto not just the bootstrappers, but the peers of the bootstrappers. Additionally, this can cause routing tables to not have some of the randomness properties that the DHT's health relies on.

As a result, the proposal is that instead of issuing bootstrappers 10+ queries (do you know A, B, C, D,... who have 0...9 bits in common with me) we issue the bootstrapper one query "please give me a random subset of peers in the network, they do not even need to be in your routing table". Note: this has nothing to do with how much you trust the bootstrapper, it's just about the type of question you are asking the bootstrapper which is amplified by the fact that the network tends to work with a small number of bootstrappers.

Ah! thanks, this makes sense!

@RubenKelevra The idea of Web of Trust is already essentially implemented. You can already add a multiaddrs of a peer you trust and bootstrap from it. :)

Well, the idea is to differentiate between the common bootstrap nodes and nodes you trust. The bootstrap nodes would be used, as a last resort, while the nodes you trust would be used on every startup to fetch some "fresh" good performing peers from them.

If we have trusted nodes, we can avoid that we have to analyze the returned peers and how they perform for us and always assume they are good.

It would also be an excellent way to bootstrap fast for small nodes that have a short runtime duration, like phones/javascript nodes.

Trusted nodes could allow each other to fulfill early dht requests, while the connections to other peers are established, reducing the time to first byte.

aarshkshah1992 commented 4 years ago

@Stebalien What's the priority of this in the larger scheme of things ?

Stebalien commented 4 years ago

Well, the priority is "we'll throw it in when we make the other DHT protocol changes".

Stebalien commented 4 years ago

Bootstrap proposal from @aschmahmann:

The bootstrappers will stop running go-ipfs. Instead, they'll run a custom libp2p daemon so we can iterate on it quickly and add dependencies like sqlite.
Use the rendezvous protocol to discover DHT peers. The bootstrappers will run this protocol and will eventually stop running the DHT service.
Instead of allowing peers to register as DHT nodes with the bootstrap peers (as one would usually with the rendezvous protocol), the bootstrappers will continuously crawl the network, looking for stable, well-behaved, long-lived peers.

RubenKelevra commented 4 years ago

I wrote a proposal, which doesn't really conflict with @aschmahmann's proposal. But would allow to update bootstrap peers and distribute the bootstrapping (and optionally much more in the future).

https://github.com/libp2p/go-libp2p-kad-dht/issues/657

aschmahmann commented 4 years ago

Additional thought. We may be able to save some work here by not immediately switching to the rendezvous protocol and having the bootstrappers:

1) Run custom code (e.g. for IPFS do not just a standard go-ipfs node) 2) Crawl the network and track peers to look for good long-lived peers 3) Return good peers when peers send us "GetClosestPeers" messages. The peers returned will be sampled from the whole network and not biased by distance to the bootstrapper in Kad space. 1) We could even consider being non-compliant DHT server nodes and returning randomly distributed good peers instead of close peers to "GetClosestPeers" queries. This would allow us to make the bootstrapping load balanced across the whole ID space instead of having peers bootstrap off of peers that are close to them in ID space.

Clients will: 1) Make sure not to add bootstrappers to their routing tables. Note: This is true even if we implement rendezvous now since we want to augment the "usefulness" function without worrying about bootstrapper special cases and because the bootstrapper nodes will still need to run the DHT server protocol to support older nodes. 2) Query bootstrap nodes explicitly for peers since they won't be in the routing tables.

Overall Upside: Likely an easier implementation path short-term Overall Downside: hacky, limits the number of peers the bootstrapper returns to 20 at a time instead of more than that

libp2p / go-libp2p-kad-dht

Load-balanced bootstrapping #574