Closed vyzo closed 5 years ago
An alternative to hard resetting is to add a new error code for overloaded relays.
But if we do this, we must stop lingering awaiting EOF in handleError
; perhaps we should just reset after sending the error.
Resetting the stream won't work, the error will not be propagated further.
But we can reduce the EOF timeout to something much more reasonable than the current 1 minute (to say 5s).
Fortunately this is a variable in go-libp2p-net
, so the consumer can set it.
Implemented the RELAY_OVERLOADED
error code, so that we don't do hard resets without notifying the other side.
The relay daemon can set the EOFTimeout
to something shorter in order to reduce the goroutine linger time.
I reverted to resetting the stream when the hop limit is exceeded -- being nice has deleterious effects in the number of lingering goroutines.
Sounds like something we need although I'm a bit worried we should be using per-peer limits instead. At the moment, ~500 peers could fully mesh-connect through the relay to kill it.
Per-peer limits are a little complex to implement, and need a lock (which I would like to avoid). We have not observed fully connected meshes in the relays whatsoever; what has been observed is a long tail distribution where most peers have a small number of streams (they are connecting to the peers behind the relay) and then a bunch have a large number of streams (they are accepting connections).
Per-peer limits are a little complex to implement, and need a lock (which I would like to avoid). We have not observed fully connected meshes in the relays whatsoever; what has been observed is a long tail distribution where most peers have a small number of streams (they are connecting to the peers behind the relay) and then a bunch have a large number of streams (they are accepting connections).
I'm primarily worried about someone attacking a relay this way but this certainly isn't the only way.
rebased/squashed to just 2 commits and dropped the RELAY_OVERLOADED
change in pb.
This adds a hard limit to the number of hop goroutines, so that relays don't get overloaded.
Note that the live hop tracking has been removed for two reasons:
Note 2: DO NOT MERGE AS IS; I will rebase/squash before merging to clean up history.