shotover / shotover-proxy

L7 data-layer proxy
https://docs.shotover.io
Apache License 2.0
83 stars 16 forks source link

set tcp nodelay for outgoing connections? #985

Open rukai opened 1 year ago

rukai commented 1 year ago

I was reading a blog post during my holiday and realized that we set tcp nodelay for client<->shotover connections but we dont set it for shotover <-> DB connections. Should we?

benbromhead commented 1 year ago

Yes we should!!

On Thu, Jan 5, 2023 at 11:37 PM Lucas Kent @.***> wrote:

I was reading a blog post during my holiday and realized that we set tcp nodelay for client<->shotover connections but we dont set it for shotover <-> DB connections. Should we?

— Reply to this email directly, view it on GitHub https://github.com/shotover/shotover-proxy/issues/985, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACG2BNZR5RPMTZJUJNC6MP3WQ2P5PANCNFSM6AAAAAATRZOARI . You are receiving this because you are subscribed to this thread.Message ID: @.***>

rukai commented 1 year ago

I wrote a benchmark that puts shotover as the bottleneck between the client and database. I observed that:

So seems we should just leave it as is. I am quite puzzled by these results though.

benbromhead commented 1 year ago

What does it do to latency though?

On Wed, Jan 25, 2023 at 8:06 PM Lucas Kent @.***> wrote:

I wrote a benchmark that puts shotover as the bottleneck between the client and database. I observed that: Removing the client<->shotover nodelay halves our throughput. Adding a shotover<->DB nodelay drops our throughput by about ~10%

So seems we should just leave it as is. I am quite puzzled by these results though.

— Reply to this email directly, view it on GitHub https://github.com/shotover/shotover-proxy/issues/985#issuecomment-1403181251, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACG2BN5JP5RHUO5VYYRESADWUDGH3ANCNFSM6AAAAAATRZOARI . You are receiving this because you commented.Message ID: @.***>

rukai commented 11 months ago

Tried enabling outgoing tcpnodelay again today, it seems to improve cassandra latency and regress redis latency. I set a cap to operations per second to avoid the noise I have not yet eliminated from windsock.

redis (left nodelay=false, right nodelay=true) image

cassandra (left nodelay=false, right nodelay=true) image

rukai commented 4 months ago

We should retry this since the recent performance improvements

rukai commented 4 months ago

I got nagle'd on a hobby project just now and it gave me the insight I needed to know how to proceed here.

We wont see performance improvement on our current integration benchmarks because they are all stress tests. We are stuffing as much data through 1 or 100 connections as possible. This wont hit issues with tcp_nodelay enabled since constantly pushing more data in will force the packet out into the network and no delay will occur.

However if we set up a benchmark where we send 1 request every second on a single connection, the latency results of that benchmark should demonstrate tcp_nodelay causing us issues!

rukai commented 4 months ago

This morning I spent some time looking into this. I set up a benchmark as I described above with 1 small request per second. However this did not reproduce the issues of nagle's algorithm at all. Whether i had tcp_nodelay enabled did not affect this benchmark.

And on closer analysis:

Nagle's algorithm states that small packets should be delayed until we get an ACK back from the server. So if nagle's was causing issues here we should be seeing 2 extra network hops of latency, giving us a latency of 0.9 ms

So all in all I'm really quite confused why we arent seeing any issues from nagle's algorithm being enabled. :confused: