Open PaulOlteanu opened 3 days ago
Several actor channels are filling up and we are dropping packets because of this. The dropped packets result in everything being a pretty bad experience since there are DISCO packets dropped which coordinate the direct connection establishment. Since all communication starts via the relay server you end up with the clients not being able to connect as they never learn about the direct addresses of the server.
We need to improve the actor channels and backpressure probably. This should be relatively straightforward to reproduce by getting many endpoints to connect to a single one.
Not 100% sure what to title this, I don't know if the issue was on the
Endpoint
side or the relay server side.I had about 10,000 clients trying to connect to a server at a rate of ~300/min, and then sending <1kb every 2 sec. On my server
Endpoint
side I was seeing a lot oftry_send: iroh_net::magicsock: send relay: message dropped, channel to actor is full
. When I restarted the server, which would result in all the clients trying to connect at once, which resulted in a wall of that log:on the relay server side, there were a lot of logs of
with the
PublicKey
being that of my server-Endpoint
at the time.There were some clients able to connect and get requests through successfully, but most failed trying to connect