libp2p / go-libp2p-circuit

Circuit Switching for libp2p
MIT License
48 stars 16 forks source link

Hard Limit the number of hop stream goroutines #74

Closed vyzo closed 5 years ago

vyzo commented 5 years ago

This adds a hard limit to the number of hop goroutines, so that relays don't get overloaded.

Note that the live hop tracking has been removed for two reasons:

Note 2: DO NOT MERGE AS IS; I will rebase/squash before merging to clean up history.

vyzo commented 5 years ago

An alternative to hard resetting is to add a new error code for overloaded relays. But if we do this, we must stop lingering awaiting EOF in handleError; perhaps we should just reset after sending the error.

vyzo commented 5 years ago

Resetting the stream won't work, the error will not be propagated further. But we can reduce the EOF timeout to something much more reasonable than the current 1 minute (to say 5s). Fortunately this is a variable in go-libp2p-net, so the consumer can set it.

vyzo commented 5 years ago

Implemented the RELAY_OVERLOADED error code, so that we don't do hard resets without notifying the other side. The relay daemon can set the EOFTimeout to something shorter in order to reduce the goroutine linger time.

vyzo commented 5 years ago

I reverted to resetting the stream when the hop limit is exceeded -- being nice has deleterious effects in the number of lingering goroutines.

vyzo commented 5 years ago

Sounds like something we need although I'm a bit worried we should be using per-peer limits instead. At the moment, ~500 peers could fully mesh-connect through the relay to kill it.

Per-peer limits are a little complex to implement, and need a lock (which I would like to avoid). We have not observed fully connected meshes in the relays whatsoever; what has been observed is a long tail distribution where most peers have a small number of streams (they are connecting to the peers behind the relay) and then a bunch have a large number of streams (they are accepting connections).

Stebalien commented 5 years ago

Per-peer limits are a little complex to implement, and need a lock (which I would like to avoid). We have not observed fully connected meshes in the relays whatsoever; what has been observed is a long tail distribution where most peers have a small number of streams (they are connecting to the peers behind the relay) and then a bunch have a large number of streams (they are accepting connections).

I'm primarily worried about someone attacking a relay this way but this certainly isn't the only way.

vyzo commented 5 years ago

rebased/squashed to just 2 commits and dropped the RELAY_OVERLOADED change in pb.