Reduce the impact of the DHT

Stebalien commented 5 years ago

Currently, all nodes participate as full DHT servers by default. Unfortunately, this means:

We have a lot of crappy unreachable DHT servers.
I have to repeatedly tell users to run their client with ipfs daemon --routing=dhtclient because the DHT is causing the network to DoS their system.

Related work:

Connect to fewer peers when querying (https://github.com/libp2p/go-libp2p-kad-dht/pull/291).
Not become DHT servers unless we're "stable" (https://github.com/libp2p/go-libp2p-kad-dht/issues/216).
Prefer UDP-based transports (QUIC by default).
Reduce the overhead of DHT queries (better buffering, multistream-2.0, tls/quic, etc.).
Store peer info on disk (#2848) so per-peer metadata isn't so expensive.
Switch away from to ed25519 by default (RSA handshakes are expensive).

However, I'm wondering if we should consider an interim solution: run in DHT-client mode by default, at least for now.

Create a "laptop" config profile and make it the default. The laptop profile will use a "client" routing option.
Create a "desktop" config profile and use an "auto" routing option. At the moment, this will default to client until we have the ability to switch between client/server mode dynamically.
Modify the "server" config profile to default the routing option to "dht".

The significant drawback to this solution is that it'll make the IPFS network significantly less "p2p". That is, in a pure p2p network, all nodes are equal. On the other hands, all nodes are clearly not equal in terms of hardware so I'm not that concerned about this.

Thoughts and concerns?

cc @whyrusleeping & @daviddias?

vyzo commented 5 years ago

If we get a laptop profile, we might want to enable autorelay by default for it as well.

vyzo commented 5 years ago

Create a "desktop" config profile and use an "auto" routing option. At the moment, this will default to client until we have the ability to switch between client/server mode dynamically.

This might be unreasonable, we might want to have this be dht by default until we have the magic option to switch dynamically.

obo20 commented 5 years ago

I'm in favor of defaulting people to dhtclient for now. This point specifically resonates with me:

We have a lot of crappy unreachable DHT servers.

My thoughts are that most people who would opt-in to be a DHT server would have somewhat of an idea what they're doing, and the nodes opting in would likely be more stable as they're intentionally configured to redistribute content.

Stebalien commented 5 years ago

If we get a laptop profile, we might want to enable autorelay by default for it as well.

SGTM.

Create a "desktop" config profile and use an "auto" routing option. At the moment, this will default to client until we have the ability to switch between client/server mode dynamically.

This might be unreasonable, we might want to have this be dht by default until we have the magic option to switch dynamically.

My thinking is that, at the moment, the DHT is too much overhead even for the average desktop. Fixing the issues I noted the issue description will help with that but, IMO, not even desktops should be DHT nodes till then.

My thoughts are that most people who would opt-in to be a DHT server would have somewhat of an idea what they're doing, and the nodes opting in would likely be more stable as they're intentionally configured to redistribute content.

Exactly.

Stebalien commented 5 years ago

Requires https://github.com/ipfs/go-ipfs/issues/6287.

vyzo commented 5 years ago

My concern is that we might end up with a DHT that is vastly undersized for the scale of the network.

Stebalien commented 5 years ago

My concern is that we might end up with a DHT that is vastly undersized for the scale of the network.

I agree although I think we'll get that simply by defaulting to the "laptop" profile. However, I'd be fine defaulting the desktop profile to "dht" at first (for a slower transition).

BillDStrong commented 5 years ago

Keeping this as a stop gap measure sounds fine. As an experimenting user, I don't want to know about all of this.

To prevent an undersized DHT I would suggest a simple test of the users hardware resources at first run. Declare some minimum threshold, and if the user exceeds that threshold, ask the user if they would like to enable some services to keep the network healthy.

You would want to overestimate the minimum hardware. You don't want the user to ever have to think about ipfs is running in the background, taking precious cycles from their games/work.

bonekill commented 5 years ago

+1 to interim plan You should not have to worry to much about destabilizing the network because...

1. Most nodes do not update with haste. Notably there are a few large projects that have their clients and server clusters hang back a few versions, so in the event that this starts leading to an absurd client to server ratio a patch can be released to start reversing the swarm back to the old behavior before problems arise. Edit: The new release process mostly outdates this.

The change is not applied to upgraded instances. For this change take effect the user would need to run "ipfs init" or make an explicit config alteration as I don't believe we can decipher between an existing config being explicitly set to "dht" or just defaulted to it.~~This reinforces 1. as the adoption rate is further reduced.~~

3. Ability to lower the α (alpha concurrency) parameter. IIRC the α parameter for searching through the DHT is cranked up to deal with all the useless nodes, Once a large number of useless nodes are removed you can pull back the α to a more sane number (ex. α = 3). While you cannot make this particular change in the same patch (because of point 1. and 2.) this should eventually lower the swarm "cost" for each query due to less canceled RPCs while in flight. Hopefully while the raw capacity drops the query efficiency rises, netting in a greater effective capacity than before. Edit: α is already = 3, and has been for a very long time,.. whoops

4. Reduces "scattershot" behavior. IPFS seems to increase the number of in flight requests the longer it takes to find valid results. Lots of useless nodes waste a lot of time to timeouts and IPFS seems to spawn many RPCs to make up for a failed one. Less wasted RPCs result in faster queries and fewer panic "scattershots" through routing tables. This behavior should decrease proportionally as the usable node ratio gets better. Not sure if this behavior is intentional/still exists but just something I have observed in the past, Edit: I cannot replicate this behavior anymore.

A warning however, you should observe post patch to see if your own DHT nodes are getting hit to hard or not and if a reversal is required. ~~While unlikely due to the above,~~ However IF the DHT client to server ratio hits a critical point the entire DHT swarm may cascade fail and be difficult to bootstrap again. You would either need to wait for a large number of requesting clients to give up and/or bulk online a large number of healthy DHT serving nodes to fix it.

yiannisbot commented 1 year ago

Isn't this issue obsolete after the IPFS v0.5 version where new nodes use AutoNAT to get their node status?

ipfs / kubo

Reduce the impact of the DHT #6283