inhindsight / hindsight

Apache License 2.0
12 stars 8 forks source link

Receive UDP Performance #191

Open ManApart opened 4 years ago

ManApart commented 4 years ago

Receive can handle around 1,000 small UDP messages a second before encountering a large percentage of messages dropped. More messages than that per second drastically increases message drop.

AC

Tech Notes

Data

Messages/second Messages Sent Messages Received Messages Dropped Percent Dropped Drift of last message in seconds Notes
100 100,000 92,740 7,260 7.26% 1 No tail dropped
1,000 1,000,000 743,765 256,235 25.62% 1 13 of the tail dropped. Is receive waiting for a full load?
1,000 1,000,000 974,190 25,810 2.58%   Netcat instead of receive
1,000 1,000,000 996,937 3,063 0.31%   Netcat instead of receive
1,000 1,000,000 1,000,000 0 0.00%   Local Console Out
1,000 1,000,000 759,007 240,993 24.10% 1 89 of the tail dropped
10,000 10,000,000 6,119,004 3,880,996 38.81% 1 180 of the tail dropped
100,000 100,000,000 38,407,682 61,592,318 61.59%   Netcat instead of receive. This took 35 minutes instead of 16 minutes due to inability to send messages fast enough from the load app. This means we really only produced 45714 messages a second
100,000 100,000,000 3,589,004 96,410,996 96.41%   18,611 of the tail dropped. This took 34 minutes instead of 16 minutes, see above.

More notes here

jessie-morris commented 4 years ago

On many linux distros there are sysctl changes which we can make to increase the size of this buffer. This will also need to be done likely on the nodes themselves. Unfortunately our containers run on alpine which does not expose these settings so if merely adding them as keys to sysctl does not work, we might want to look at running on a container OS which is more obviously tunable.

jeffgrunewald commented 4 years ago

buffer size is a configurable setting of a gen_udp socket already natively in erlang. i didn't expose this on the first pass because there was a lot of judgement around the idea of "buffering into the socket"

jessie-morris commented 4 years ago

Yea, I think we'll likely want to tune both the properties around the network layer and the Application layer of the OSI model to reach maximum performance.

jeffgrunewald commented 4 years ago

we should focus on writer speed/parallelism as relying entirely on increasing buffer size feels like a brittle bandaid. However, when we do OS-level tuning we can take into account that the UDP receive socket should be running in an operator that can provision the necessary network port so it can also use a more optimized container image