n0-computer / iroh

peer-2-peer that just works
https://iroh.computer
Apache License 2.0
2.63k stars 166 forks source link

relay-related issues connecting many client `Endpoint`s to one server `Endpoint` #2971

Open PaulOlteanu opened 3 days ago

PaulOlteanu commented 3 days ago

Not 100% sure what to title this, I don't know if the issue was on the Endpoint side or the relay server side.

I had about 10,000 clients trying to connect to a server at a rate of ~300/min, and then sending <1kb every 2 sec. On my server Endpoint side I was seeing a lot of try_send: iroh_net::magicsock: send relay: message dropped, channel to actor is full. When I restarted the server, which would result in all the clients trying to connect at once, which resulted in a wall of that log:

2024-11-25T22:05:57.709301Z  WARN ep:magicsock:actor:handle_ping_actions: iroh_net::magicsock: send relay: message dropped, channel to actor is full node=z4wj5ubvyixfn6jw relay_url=xxx me=vr6nx5o36gts5kqw
2024-11-25T22:05:57.709688Z  WARN ep:magicsock:actor:handle_ping_actions: iroh_net::magicsock: send relay: message dropped, channel to actor is full node=z4wj5ubvyixfn6jw relay_url=xxx me=vr6nx5o36gts5kqw
2024-11-25T22:05:57.709717Z  WARN ep:magicsock:actor:handle_ping_actions: iroh_net::magicsock: relay channel full, dropping call-me-maybe dstkey=z4wj5ubvyixfn6jw relayurl=xxx me=vr6nx5o36gts5kqw
2024-11-25T22:05:57.709760Z  WARN ep:magicsock:actor:handle_ping_actions: iroh_net::magicsock: send relay: message dropped, channel to actor is full node=unzs4fhhvzsxvaq5 relay_url=xxx me=vr6nx5o36gts5kqw
2024-11-25T22:05:57.710196Z  WARN ep:magicsock:actor:handle_ping_actions: iroh_net::magicsock: send relay: message dropped, channel to actor is full node=unzs4fhhvzsxvaq5 relay_url=xxx me=vr6nx5o36gts5kqw
2024-11-25T22:05:57.710228Z  WARN ep:magicsock:actor:handle_ping_actions: iroh_net::magicsock: relay channel full, dropping call-me-maybe dstkey=unzs4fhhvzsxvaq5 relayurl=xxx me=vr6nx5o36gts5kqw
2024-11-25T22:05:57.710266Z  WARN ep:magicsock:actor:handle_ping_actions: iroh_net::magicsock: send relay: message dropped, channel to actor is full node=p6yde6zwtrpwerqw relay_url=xxx me=vr6nx5o36gts5kqw
2024-11-25T22:05:57.710839Z  WARN ep:magicsock:actor:handle_ping_actions: iroh_net::magicsock: send relay: message dropped, channel to actor is full node=p6yde6zwtrpwerqw relay_url=xxx me=vr6nx5o36gts5kqw
2024-11-25T22:05:57.710867Z  WARN ep:magicsock:actor:handle_ping_actions: iroh_net::magicsock: relay channel full, dropping call-me-maybe dstkey=p6yde6zwtrpwerqw relayurl=xxx me=vr6nx5o36gts5kqw
2024-11-25T22:05:57.710903Z  WARN ep:magicsock:actor:handle_ping_actions: iroh_net::magicsock: send relay: message dropped, channel to actor is full node=fstlzagjalkbnymo relay_url=xxx me=vr6nx5o36gts5kqw
2024-11-25T22:05:57.711368Z  WARN ep:magicsock:actor:handle_ping_actions: iroh_net::magicsock: send relay: message dropped, channel to actor is full node=fstlzagjalkbnymo relay_url=xxx me=vr6nx5o36gts5kqw
2024-11-25T22:05:57.711420Z  WARN ep:magicsock:actor:handle_ping_actions: iroh_net::magicsock: relay channel full, dropping call-me-maybe dstkey=fstlzagjalkbnymo relayurl=xxx me=vr6nx5o36gts5kqw
2024-11-25T22:05:57.711457Z  WARN ep:magicsock:actor:handle_ping_actions: iroh_net::magicsock: send relay: message dropped, channel to actor is full node=4poqvd4kppki6dao relay_url=xxx me=vr6nx5o36gts5kqw
2024-11-25T22:05:57.711959Z  WARN ep:magicsock:actor:handle_ping_actions: iroh_net::magicsock: send relay: message dropped, channel to actor is full node=4poqvd4kppki6dao relay_url=xxx me=vr6nx5o36gts5kqw
2024-11-25T22:05:57.711990Z  WARN ep:magicsock:actor:handle_ping_actions: iroh_net::magicsock: relay channel full, dropping call-me-maybe dstkey=4poqvd4kppki6dao relayurl=xxx me=vr6nx5o36gts5kqw
2024-11-25T22:05:57.712028Z  WARN ep:magicsock:actor:handle_ping_actions: iroh_net::magicsock: send relay: message dropped, channel to actor is full node=yzewz7orsbjmwlpx relay_url=xxx me=vr6nx5o36gts5kqw
2024-11-25T22:05:57.712434Z  WARN ep:magicsock:actor:handle_ping_actions: iroh_net::magicsock: send relay: message dropped, channel to actor is full node=yzewz7orsbjmwlpx relay_url=xxx me=vr6nx5o36gts5kqw
2024-11-25T22:05:57.712463Z  WARN ep:magicsock:actor:handle_ping_actions: iroh_net::magicsock: relay channel full, dropping call-me-maybe dstkey=yzewz7orsbjmwlpx relayurl=xxx me=vr6nx5o36gts5kqw
2024-11-25T22:05:57.712499Z  WARN ep:magicsock:actor:handle_ping_actions: iroh_net::magicsock: send relay: message dropped, channel to actor is full node=5t44ivrezsrhw4es relay_url=xxx me=vr6nx5o36gts5kqw
2024-11-25T22:05:57.713075Z  WARN ep:magicsock:actor:handle_ping_actions: iroh_net::magicsock: send relay: message dropped, channel to actor is full node=5t44ivrezsrhw4es relay_url=xxx me=vr6nx5o36gts5kqw
2024-11-25T22:05:57.713104Z  WARN ep:magicsock:actor:handle_ping_actions: iroh_net::magicsock: relay channel full, dropping call-me-maybe dstkey=5t44ivrezsrhw4es relayurl=xxx me=vr6nx5o36gts5kqw
2024-11-25T22:05:57.713247Z  WARN ep:magicsock:actor:handle_ping_actions: iroh_net::magicsock: send relay: message dropped, channel to actor is full node=ikm6sqdfrrxlrntl relay_url=xxx me=vr6nx5o36gts5kqw
2024-11-25T22:05:57.713541Z  WARN ep:magicsock:actor:handle_ping_actions: iroh_net::magicsock: send relay: message dropped, channel to actor is full node=ikm6sqdfrrxlrntl relay_url=xxx me=vr6nx5o36gts5kqw
2024-11-25T22:05:57.713596Z  WARN ep:magicsock:actor:handle_ping_actions: iroh_net::magicsock: relay channel full, dropping call-me-maybe dstkey=ikm6sqdfrrxlrntl relayurl=xxx me=vr6nx5o36gts5kqw
2024-11-25T22:05:57.713633Z  WARN ep:magicsock:actor:handle_ping_actions: iroh_net::magicsock: send relay: message dropped, channel to actor is full node=2g6jdbviav6t3vxv relay_url=xxx me=vr6nx5o36gts5kqw
2024-11-25T22:05:57.713976Z  WARN ep:magicsock:actor:handle_ping_actions: iroh_net::magicsock: send relay: message dropped, channel to actor is full node=2g6jdbviav6t3vxv relay_url=xxx me=vr6nx5o36gts5kqw
2024-11-25T22:05:57.714006Z  WARN ep:magicsock:actor:handle_ping_actions: iroh_net::magicsock: relay channel full, dropping call-me-maybe dstkey=2g6jdbviav6t3vxv relayurl=xxx me=vr6nx5o36gts5kqw
2024-11-25T22:05:57.714041Z  WARN ep:magicsock:actor:handle_ping_actions: iroh_net::magicsock: send relay: message dropped, channel to actor is full node=sygycyckomhlhyvo relay_url=xxx me=vr6nx5o36gts5kqw
2024-11-25T22:05:57.714280Z  WARN ep:magicsock:actor:handle_ping_actions: iroh_net::magicsock: send relay: message dropped, channel to actor is full node=sygycyckomhlhyvo relay_url=xxx me=vr6nx5o36gts5kqw
2024-11-25T22:05:57.714307Z  WARN ep:magicsock:actor:handle_ping_actions: iroh_net::magicsock: relay channel full, dropping call-me-maybe dstkey=sygycyckomhlhyvo relayurl=xxx me=vr6nx5o36gts5kqw
2024-11-25T22:05:57.714344Z  WARN ep:magicsock:actor:handle_ping_actions: iroh_net::magicsock: send relay: message dropped, channel to actor is full node=6lb527ubwgqk6ykn relay_url=xxx me=vr6nx5o36gts5kqw
2024-11-25T22:05:57.714676Z  WARN ep:magicsock:actor:handle_ping_actions: iroh_net::magicsock: send relay: message dropped, channel to actor is full node=6lb527ubwgqk6ykn relay_url=xxx me=vr6nx5o36gts5kqw
2024-11-25T22:05:57.714704Z  WARN ep:magicsock:actor:handle_ping_actions: iroh_net::magicsock: relay channel full, dropping call-me-maybe dstkey=6lb527ubwgqk6ykn relayurl=xxx me=vr6nx5o36gts5kqw
2024-11-25T22:05:57.714730Z  WARN ep:magicsock:actor:handle_ping_actions: iroh_net::magicsock: send relay: message dropped, channel to actor is full node=bv7ti723wdm7bgck relay_url=xxx me=vr6nx5o36gts5kqw
etc...

on the relay server side, there were a lot of logs of

2024-11-25T21:40:49.528804Z  WARN iroh_relay::server::actor: no way to reach client, dropped packet dst=PublicKey(ysbnyw5olxxtxuzt)
2024-11-25T21:40:49.529038Z  WARN iroh_relay::server::actor: no way to reach client, dropped packet dst=PublicKey(ysbnyw5olxxtxuzt)
2024-11-25T21:40:49.529549Z  WARN iroh_relay::server::actor: no way to reach client, dropped packet dst=PublicKey(ysbnyw5olxxtxuzt)
2024-11-25T21:40:49.529561Z  WARN iroh_relay::server::actor: no way to reach client, dropped packet dst=PublicKey(ysbnyw5olxxtxuzt)
2024-11-25T21:40:49.529565Z  WARN iroh_relay::server::actor: no way to reach client, dropped packet dst=PublicKey(ysbnyw5olxxtxuzt)
2024-11-25T21:40:49.529570Z  WARN iroh_relay::server::actor: no way to reach client, dropped packet dst=PublicKey(ysbnyw5olxxtxuzt)
2024-11-25T21:40:49.529573Z  WARN iroh_relay::server::actor: no way to reach client, dropped packet dst=PublicKey(ysbnyw5olxxtxuzt)

with the PublicKey being that of my server-Endpoint at the time.

There were some clients able to connect and get requests through successfully, but most failed trying to connect

flub commented 3 days ago

Several actor channels are filling up and we are dropping packets because of this. The dropped packets result in everything being a pretty bad experience since there are DISCO packets dropped which coordinate the direct connection establishment. Since all communication starts via the relay server you end up with the clients not being able to connect as they never learn about the direct addresses of the server.

We need to improve the actor channels and backpressure probably. This should be relatively straightforward to reproduce by getting many endpoints to connect to a single one.