This pull-request has two major parts. The first one is adding benchmarks to check performance of simply receiving new messages without any further processing by parsers and formatters.
The second part is a very simple change that improve UDP processing throughput by 6 times.
These are the results comparing benchmark results before and after the proposed change:
benchmark old ns/op new ns/op delta
BenchmarkDatagramNoFormatting-4 16172 2668 -83.50%
BenchmarkTCPNoFormatting-4 7682 7644 -0.49%
benchmark old MB/s new MB/s speedup
BenchmarkDatagramNoFormatting-4 2.91 17.62 6.05x
BenchmarkTCPNoFormatting-4 6.12 6.15 1.00x
benchmark old allocs new allocs delta
BenchmarkDatagramNoFormatting-4 5 4 -20.00%
BenchmarkTCPNoFormatting-4 6 6 +0.00%
benchmark old bytes new bytes delta
BenchmarkDatagramNoFormatting-4 65905 368 -99.44%
BenchmarkTCPNoFormatting-4 464 464 +0.00%
Receiving messages from a UDP connection used to be very slow, slower even than using TCP which was a little strange. By simply reusing the buffer used to receive datagrams we can see a major improvement in processing time and memory allocations.
I came across this change after profiling https://github.com/tsuru/bs in a production environment and coming across this output:
(pprof) top
55.43GB of 59.56GB total (93.06%)
Dropped 350 nodes (cum <= 0.30GB)
Showing top 10 nodes out of 53 (cum >= 0.37GB)
flat flat% sum% cum cum%
50.77GB 85.24% 85.24% 50.87GB 85.42% github.com/tsuru/bs/vendor/gopkg.in/mcuadros/go-syslog%2ev2.(*Server).goReceiveDatagrams.func1
0.87GB 1.47% 86.71% 1.02GB 1.71% github.com/tsuru/bs/log.(*LenientParser).Parse
0.61GB 1.03% 87.74% 0.61GB 1.03% github.com/tsuru/bs/vendor/golang.org/x/net/websocket.(*hybiFrameWriter).Write
0.55GB 0.93% 88.67% 0.61GB 1.03% encoding/xml.(*Decoder).rawToken
0.51GB 0.86% 89.53% 0.54GB 0.91% net.(*dnsMsg).Pack
0.49GB 0.83% 90.36% 0.80GB 1.35% io.copyBuffer
0.49GB 0.82% 91.18% 0.52GB 0.87% github.com/tsuru/bs/log.(*LenientParser).Dump
0.38GB 0.64% 91.82% 0.77GB 1.29% net.unpackStruct
0.38GB 0.63% 92.45% 0.38GB 0.63% net.unpackDomainName
0.37GB 0.61% 93.06% 0.37GB 0.61% bytes.makeSlice
This pull-request has two major parts. The first one is adding benchmarks to check performance of simply receiving new messages without any further processing by parsers and formatters. The second part is a very simple change that improve UDP processing throughput by 6 times.
These are the results comparing benchmark results before and after the proposed change:
Receiving messages from a UDP connection used to be very slow, slower even than using TCP which was a little strange. By simply reusing the buffer used to receive datagrams we can see a major improvement in processing time and memory allocations.
I came across this change after profiling https://github.com/tsuru/bs in a production environment and coming across this output: