Open petar opened 4 years ago
Bootstrapping becomes a dedicated network call, supported by DHT nodes
I'd use a different protocol string for this protocol instead of adding this to the DHT protocol. Otherwise, we'd have to have some form of "don't add the bootstrapper to your routing table" exception. If we do that and different users pick different bootstrappers, we're going to have some pretty weird behavior.
Other concerns:
@petar
In response to a bootstrap request, the peer returns K random peers from the network.
K random peers that support the DHT protocol, right ? So nodes can then run queries against them to fill their Routing Tables.
@Stebalien
I'd use a different protocol string for this protocol instead of adding this to the DHT protocol. Otherwise, we'd have to have some form of "don't add the bootstrapper to your routing table" exception.
Please can you explain this ? Shouldn't even the DHT peers remove the bootstrappers from their RT after they get a reply from the bootstrapping node as mentioned in step 2 ?
You're probably right. If the DHT knows which peers are bootstrappers, it's easy. If it doesn't, we'll have to have some heuristic where we take the first several nodes we see and use the bootstrap protocol.
Hey @Stebalien @petar
Assigning this too myself to get the discussions rolling on this one and come up with a writeup/proposal. Let me know if there are any concerns.
Shouldn't bootstrapping be considered a one-time-operation after installation?
I mean, since we remove the "behind nat" peers from the DHT we should end up with much more stable peer tables.
So maybe we should try to recover from a 'downtime' with say, the best 20% of the known peers, and if this fails return to the known bootstrap nodes.
This, of course, makes only sense, if the node has run a minimum amount of time, like 1 or 2 hours (after installation).
To make sure that we don't run into net-splits, we might want to query from time to time one of the bootstrapping nodes, to get some "fresh" & known good peers.
- I'd prefer to just use a general-purpose peer exchange protocol (give me peers that speak protocol X, give me peers that have advertised Y, etc.). But this may be tricky given that we should try to return "good" peers.
This sounds like a good solution.
For the bootstrapping requests, we should consider saving the peer which provided those peers as a set and rate those sets depending on how they work for us for queries in the long run. (like proposed for the routing responses in #589 )
This way we can greylist peers that supply bad peers to us - so we don't use them as bootstrapping nodes.
Shouldn't bootstrapping be considered a one-time-operation after installation?
Yes, and that relates to #460. This issue is more about how we do that one-time operation (although it'll be a once per reboot operation until #460, or some reworked version of it lands).
To make sure that we don't run into net-splits, we might want to query from time to time one of the bootstrapping nodes, to get some "fresh" & known good peers.
I just notice, that this idea is inherently flawed: It allows to identify peers very easily in normal operating conditions, because of reoccurring traffic patterns. If this is done, we should use the relay function for these queries, this hides what we're doing for everybody unable to read the encryption.
Shouldn't bootstrapping be considered a one-time-operation after installation?
Yes, and that relates to #460. This issue is more about how we do that one-time operation (although it'll be a once per reboot operation until #460, or some reworked version of it lands).
I think I understand the initial ticket differently, since it makes no sense to request random DHT nodes if we have never bootstrapped - we don't know anybody except the hardcoded bootstrap nodes.
Edit:
On the other hand, a simple Web of Trust could help to avoid the bootstrap situation entirely - a new user might have a friend who he trusts. So he can add him to a list of trusted nodes, and bootstrap over him.
The more trusted nodes a user has, the more independent he is from the bootstrapping nodes.
I think I understand the initial ticket differently, since it makes no sense to request random DHT nodes if we have never bootstrapped - we don't know anybody except the hardcoded bootstrap nodes.
I think you are misunderstanding the issue. Currently when we ask bootstrappers for peers we do so by issuing standard queries towards areas of our routing table (e.g. help me find node A (that happens to have 0 bits in common with me in Kademlia space), help me find B (which has 1 bit in common), etc.). The bootstrapper is returning us peers from its routing table. If many users are hitting the bootstrap nodes that inherently skews the load onto not just the bootstrappers, but the peers of the bootstrappers. Additionally, this can cause routing tables to not have some of the randomness properties that the DHT's health relies on.
As a result, the proposal is that instead of issuing bootstrappers 10+ queries (do you know A, B, C, D,... who have 0...9 bits in common with me) we issue the bootstrapper one query "please give me a random subset of peers in the network, they do not even need to be in your routing table". Note: this has nothing to do with how much you trust the bootstrapper, it's just about the type of question you are asking the bootstrapper which is amplified by the fact that the network tends to work with a small number of bootstrappers.
I just notice, that this idea is inherently flawed: It allows to identify peers very easily in normal operating conditions, because of reoccurring traffic patterns. If this is done, we should use the relay function for these queries, this hides what we're doing for everybody unable to read the encryption.
I don't understand the argument here. Is this about whether an adversary can use traffic analysis to detect if we're a libp2p node? I don't think a relay is going to protect you here since the adversary could also notice traffic spikes from you to the relay. While you could try and bundle more and more of your requests through the relay for traffic obfuscation that just adds cost (bandwidth + machine resources for the relay, plus latency for you), but still ends up being susceptible to traffic analysis.
@RubenKelevra The idea of Web of Trust is already essentially implemented. You can already add a multiaddrs of a peer you trust and bootstrap from it. :)
I just notice, that this idea is inherently flawed: It allows to identify peers very easily in normal operating conditions, because of reoccurring traffic patterns. If this is done, we should use the relay function for these queries, this hides what we're doing for everybody unable to read the encryption.
I don't understand the argument here. Is this about whether an adversary can use traffic analysis to detect if we're a libp2p node? I don't think a relay is going to protect you here since the adversary could also notice traffic spikes from you to the relay. While you could try and bundle more and more of your requests through the relay for traffic obfuscation that just adds cost (bandwidth + machine resources for the relay, plus latency for you), but still ends up being susceptible to traffic analysis.
Well yeah, we currently rely on a list of relays for this function. But when we return to a "every node in the network can offer relay services" we could just use a random peer we are connected to, to relay our requests to the bootstrap nodes.
Is my assumption correct, that if we bootstrap just once and use the best performing peers from our last session we end up risking netsplits, since there's no common element we connect to?
I think I understand the initial ticket differently, since it makes no sense to request random DHT nodes if we have never bootstrapped - we don't know anybody except the hardcoded bootstrap nodes.
I think you are misunderstanding the issue. Currently when we ask bootstrappers for peers we do so by issuing standard queries towards areas of our routing table (e.g. help me find node A (that happens to have 0 bits in common with me in Kademlia space), help me find B (which has 1 bit in common), etc.). The bootstrapper is returning us peers from its routing table. If many users are hitting the bootstrap nodes that inherently skews the load onto not just the bootstrappers, but the peers of the bootstrappers. Additionally, this can cause routing tables to not have some of the randomness properties that the DHT's health relies on.
As a result, the proposal is that instead of issuing bootstrappers 10+ queries (do you know A, B, C, D,... who have 0...9 bits in common with me) we issue the bootstrapper one query "please give me a random subset of peers in the network, they do not even need to be in your routing table". Note: this has nothing to do with how much you trust the bootstrapper, it's just about the type of question you are asking the bootstrapper which is amplified by the fact that the network tends to work with a small number of bootstrappers.
Ah! thanks, this makes sense!
@RubenKelevra The idea of Web of Trust is already essentially implemented. You can already add a multiaddrs of a peer you trust and bootstrap from it. :)
Well, the idea is to differentiate between the common bootstrap nodes and nodes you trust. The bootstrap nodes would be used, as a last resort, while the nodes you trust would be used on every startup to fetch some "fresh" good performing peers from them.
If we have trusted nodes, we can avoid that we have to analyze the returned peers and how they perform for us and always assume they are good.
It would also be an excellent way to bootstrap fast for small nodes that have a short runtime duration, like phones/javascript nodes.
Trusted nodes could allow each other to fulfill early dht requests, while the connections to other peers are established, reducing the time to first byte.
@Stebalien What's the priority of this in the larger scheme of things ?
Well, the priority is "we'll throw it in when we make the other DHT protocol changes".
Bootstrap proposal from @aschmahmann:
I wrote a proposal, which doesn't really conflict with @aschmahmann's proposal. But would allow to update bootstrap peers and distribute the bootstrapping (and optionally much more in the future).
Additional thought. We may be able to save some work here by not immediately switching to the rendezvous protocol and having the bootstrappers:
1) Run custom code (e.g. for IPFS do not just a standard go-ipfs node) 2) Crawl the network and track peers to look for good long-lived peers 3) Return good peers when peers send us "GetClosestPeers" messages. The peers returned will be sampled from the whole network and not biased by distance to the bootstrapper in Kad space. 1) We could even consider being non-compliant DHT server nodes and returning randomly distributed good peers instead of close peers to "GetClosestPeers" queries. This would allow us to make the bootstrapping load balanced across the whole ID space instead of having peers bootstrap off of peers that are close to them in ID space.
Clients will: 1) Make sure not to add bootstrappers to their routing tables. Note: This is true even if we implement rendezvous now since we want to augment the "usefulness" function without worrying about bootstrapper special cases and because the bootstrapper nodes will still need to run the DHT server protocol to support older nodes. 2) Query bootstrap nodes explicitly for peers since they won't be in the routing tables.
Overall Upside: Likely an easier implementation path short-term Overall Downside: hacky, limits the number of peers the bootstrapper returns to 20 at a time instead of more than that
The current bootstrap protocol:
To remedy this, a new bootstrap protocol should go as follows:
To load-balance (and increase speed), bootstrappers can pre-lookup random peers in the network in their free time (on a separate thread), so that they can respond immediately to step 2 (above).
Furthermore, if bootstrappers accumulate a set of N >> K random peers in the network, they can respond to bootstrap queries (step 2, above) by picking a random subset of size K from the accumulated N.