Puma serves requests with significantly higher latency on HTTP/1.1 compared to HTTP/1.0

puma / puma

A Ruby/Rack web server built for parallelism

https://puma.io

BSD 3-Clause "New" or "Revised" License

7.7k stars 1.44k forks source link

Puma serves requests with significantly higher latency on HTTP/1.1 compared to HTTP/1.0 #3443

Open amitsaxena opened 3 months ago

amitsaxena commented 3 months ago

Describe the bug While benchmarking our app against a similar Heroku and AWS stack during an AWS migration we encountered > 20X latencies on AWS compared to what we were getting on Heroku. Our AWS setup consists of Application load balancer + ECS + Fargate. We have ensured there's no CPU or memory contention on the Fargate instance and still Puma is responding to requests in a painfully slow manner when hit with higher concurrency. After several days of debugging we have come to the conclusion that the load balancer keep alive setting is causing the significant differences in latency. We were eventually able to reproduce the issue by hitting the public IP of the container directly with (HTTP/1.1) and without (HTTP/1.0) keep alive. Below are a couple of load tests showing the difference in latencies:

$ ab -n 5000 -c 100 http://xx.xx.xx.xx:9292/health/ping          
This is ApacheBench, Version 2.3 <$Revision: 1913912 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Document Path:          /health/ping
Document Length:        61 bytes

Concurrency Level:      100
Time taken for tests:   6.857 seconds
Complete requests:      5000
Failed requests:        0
Total transferred:      2720000 bytes
HTML transferred:       305000 bytes
Requests per second:    729.16 [#/sec] (mean)
Time per request:       137.144 [ms] (mean)
Time per request:       1.371 [ms] (mean, across all concurrent requests)
Transfer rate:          387.37 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:       44   58  28.0     48    1074
Processing:    46   76  20.1     69     283
Waiting:       46   76  20.1     69     280
Total:         93  134  38.8    121    1174

Percentage of the requests served within a certain time (ms)
  50%    121
  66%    133
  75%    144
  80%    154
  90%    188
  95%    206
  98%    228
  99%    245
 100%   1174 (longest request)

$ ab -k -n 5000 -c 100 http://xx.xx.xx.xx:9292/health/ping
This is ApacheBench, Version 2.3 <$Revision: 1913912 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Document Path:          /health/ping
Document Length:        61 bytes

Concurrency Level:      100
Time taken for tests:   115.548 seconds
Complete requests:      5000
Failed requests:        0
Keep-Alive requests:    4521
Total transferred:      2828504 bytes
HTML transferred:       305000 bytes
Requests per second:    43.27 [#/sec] (mean)
Time per request:       2310.952 [ms] (mean)
Time per request:       23.110 [ms] (mean, across all concurrent requests)
Transfer rate:          23.91 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    6  21.1      0    1044
Processing:    45 2203 6208.9     47   22969
Waiting:       45 2203 6208.9     47   22969
Total:         45 2209 6223.8     47   23013

Percentage of the requests served within a certain time (ms)
  50%     47
  66%     47
  75%     48
  80%     48
  90%  12627
  95%  21117
  98%  22017
  99%  22520
 100%  23013 (longest request)

Puma config:

# frozen_string_literal: true

max_threads_count = ENV.fetch("RAILS_MAX_THREADS", 1)
min_threads_count = ENV.fetch("RAILS_MIN_THREADS") { max_threads_count }
threads min_threads_count, max_threads_count

# Specifies the `port` that Puma will listen on to receive requests; default is 3002.
port ENV.fetch("PORT", 3002)

environment ENV.fetch("RAILS_ENV", "development")

workers ENV.fetch("WEB_CONCURRENCY", 1)

preload_app!

On the Fargate container below is the value of environment variables:

RAILS_MAX_THREADS=1
PORT=9292
RAILS_ENV=production
WEB_CONCURRENCY=2 # Fargate container has 2vCPUs

To Reproduce I was able to reproduce the issue on a simple rack app specified in the bug template.

WEB_CONCURRENCY=2 RAILS_MAX_THREADS=1 PORT=9292 RAILS_ENV=production bundle exec puma -C puma.rb hello.ru

# Gemfile
source 'https://rubygems.org'

gem 'puma'
gem 'rack'

# Gemfile.lock
GEM
  remote: https://rubygems.org/
  specs:
    nio4r (2.7.3)
    puma (6.4.2)
      nio4r (~> 2.0)
    rack (3.1.7)

PLATFORMS
  arm64-darwin-23
  ruby

DEPENDENCIES
  puma
  rack

BUNDLED WITH
   2.5.14

# puma.rb

# frozen_string_literal: true

max_threads_count = ENV.fetch("RAILS_MAX_THREADS", 1)
min_threads_count = ENV.fetch("RAILS_MIN_THREADS") { max_threads_count }
threads min_threads_count, max_threads_count

port ENV.fetch("PORT", 3002)

environment ENV.fetch("RAILS_ENV", "development")

workers ENV.fetch("WEB_CONCURRENCY", 1)

preload_app!

# hello.ru

run lambda { |env| [200, {"Content-Type" => "text/plain"}, ["Hello World"]] }

Below are the apache bench results.

Without keep alive:

$ ab -n 5000 -c 100 http://0.0.0.0:9292/
This is ApacheBench, Version 2.3 <$Revision: 1913912 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking 0.0.0.0 (be patient)
Completed 500 requests
Completed 1000 requests
Completed 1500 requests
Completed 2000 requests
Completed 2500 requests
Completed 3000 requests
Completed 3500 requests
Completed 4000 requests
Completed 4500 requests
Completed 5000 requests
Finished 5000 requests

Server Software:        
Server Hostname:        0.0.0.0
Server Port:            9292

Document Path:          /
Document Length:        11 bytes

Concurrency Level:      100
Time taken for tests:   0.199 seconds
Complete requests:      5000
Failed requests:        0
Total transferred:      380000 bytes
HTML transferred:       55000 bytes
Requests per second:    25158.37 [#/sec] (mean)
Time per request:       3.975 [ms] (mean)
Time per request:       0.040 [ms] (mean, across all concurrent requests)
Transfer rate:          1867.22 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.7      0       3
Processing:     1    4   0.6      4       8
Waiting:        0    4   0.6      4       8
Total:          3    4   0.9      4       9

Percentage of the requests served within a certain time (ms)
  50%      4
  66%      4
  75%      4
  80%      4
  90%      4
  95%      6
  98%      8
  99%      8
 100%      9 (longest request)

With keep alive:

$ ab -k -n 5000 -c 100 http://0.0.0.0:9292/
This is ApacheBench, Version 2.3 <$Revision: 1913912 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking 0.0.0.0 (be patient)
Completed 500 requests
Completed 1000 requests
Completed 1500 requests
Completed 2000 requests
Completed 2500 requests
Completed 3000 requests
Completed 3500 requests
Completed 4000 requests
Completed 4500 requests
Completed 5000 requests
Finished 5000 requests

Server Software:        
Server Hostname:        0.0.0.0
Server Port:            9292

Document Path:          /
Document Length:        11 bytes

Concurrency Level:      100
Time taken for tests:   10.171 seconds
Complete requests:      5000
Failed requests:        0
Keep-Alive requests:    4511
Total transferred:      488264 bytes
HTML transferred:       55000 bytes
Requests per second:    491.58 [#/sec] (mean)
Time per request:       203.424 [ms] (mean)
Time per request:       2.034 [ms] (mean, across all concurrent requests)
Transfer rate:          46.88 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.5      0       6
Processing:     0  103 818.5      0   10042
Waiting:        0  103 818.5      0   10042
Total:          0  103 818.5      0   10042

Percentage of the requests served within a certain time (ms)
  50%      0
  66%      0
  75%      0
  80%      0
  90%     21
  95%     22
  98%     46
  99%   5129
 100%  10042 (longest request)

You can clearly see that the average latency is exponentially high with HTTP/1.1 or keep alive option.

Expected behavior Comparable latencies with or without keep-alive, or in fact better latencies with keep-alive option.

Desktop (please complete the following information):

OS: Mac
Puma Version: 6.4.2

dentarg commented 3 months ago

Hmm. https://github.com/puma/puma/issues/2625, https://github.com/puma/puma/issues/1565

dentarg commented 3 months ago

Can we trust ab? https://httpd.apache.org/docs/2.4/programs/ab.html states under "Bugs"

It does not implement HTTP/1.x fully; only accepts some 'expected' forms of responses.

dentarg commented 3 months ago

Puma has been benchmarked extensively using the wrk fork by @ioquatix https://github.com/ioquatix/wrk, see https://github.com/puma/puma/blob/master/docs/testing_benchmarks_local_files.md for some docs about that

I tried this myself just now:

$ WEB_CONCURRENCY=2 PUMA_MAX_THREADS=1 b e puma --log-requests hello.ru
[43957] Puma starting in cluster mode...
[43957] * Puma version: 6.4.2 (ruby 3.2.4-p170) ("The Eagle of Durango")
[43957] *  Min threads: 0
[43957] *  Max threads: 1
[43957] *  Environment: development
[43957] *   Master PID: 43957
[43957] *      Workers: 2
[43957] *     Restarts: (✔) hot (✖) phased
[43957] * Preloading application
[43957] * Listening on http://0.0.0.0:9292
[43957] Use Ctrl-C to stop
[43957] - Worker 0 (PID: 43960) booted in 0.0s, phase: 0
[43957] - Worker 1 (PID: 43961) booted in 0.0s, phase: 0

$ ./wrk --duration 10 --connections 2 --threads 1 --latency http://127.0.0.1:9292
Running 10s test @ http://127.0.0.1:9292
  1 threads and 2 connections
connection 0: 241766 requests completed
connection 1: 241620 requests completed
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    42.15us   14.34us 584.00us   96.02%
    Req/Sec    48.09k     4.77k   57.63k    96.04%
  Latency Distribution
     50%   40.00us
     75%   46.00us
     90%   48.00us
     99%  108.00us
  483386 requests in 10.10s, 35.04MB read
Requests/sec:  47854.12
Transfer/sec:      3.47MB

dentarg commented 3 months ago

I think you're just experiencing the nature of keepalive? Try more Puma threads and/or workers?

dentarg commented 3 months ago

Using https://github.com/rakyll/hey I see the same thing as you:

$ hey -n 5000 -c 100 http://127.0.0.1:9292

Summary:
  Total:    3.5634 secs
  Slowest:  1.7245 secs
  Fastest:  0.0000 secs
  Average:  0.0288 secs
  Requests/sec: 1403.1650

  Total data:   55000 bytes
  Size/request: 11 bytes

Response time histogram:
  0.000 [1] |
  0.172 [4791]  |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
  0.345 [114]   |■
  0.517 [7] |
  0.690 [12]    |
  0.862 [10]    |
  1.035 [8] |
  1.207 [11]    |
  1.380 [21]    |
  1.552 [11]    |
  1.725 [14]    |

Latency distribution:
  10% in 0.0000 secs
  25% in 0.0001 secs
  50% in 0.0001 secs
  75% in 0.0002 secs
  90% in 0.0028 secs
  95% in 0.0411 secs
  99% in 1.0752 secs

Details (average, fastest, slowest):
  DNS+dialup:   0.0000 secs, 0.0000 secs, 1.7245 secs
  DNS-lookup:   0.0000 secs, 0.0000 secs, 0.0000 secs
  req write:    0.0000 secs, 0.0000 secs, 0.0012 secs
  resp wait:    0.0288 secs, 0.0000 secs, 1.7245 secs
  resp read:    0.0000 secs, 0.0000 secs, 0.0019 secs

Status code distribution:
  [200] 5000 responses

$ hey -n 5000 -c 100 -disable-keepalive http://127.0.0.1:9292

Summary:
  Total:    0.1860 secs
  Slowest:  0.0084 secs
  Fastest:  0.0017 secs
  Average:  0.0037 secs
  Requests/sec: 26887.0389

  Total data:   55000 bytes
  Size/request: 11 bytes

Response time histogram:
  0.002 [1] |
  0.002 [17]    |
  0.003 [816]   |■■■■■■■■■■■
  0.004 [2902]  |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
  0.004 [792]   |■■■■■■■■■■■
  0.005 [80]    |■
  0.006 [67]    |■
  0.006 [151]   |■■
  0.007 [64]    |■
  0.008 [105]   |■
  0.008 [5] |

Latency distribution:
  10% in 0.0030 secs
  25% in 0.0032 secs
  50% in 0.0034 secs
  75% in 0.0038 secs
  90% in 0.0043 secs
  95% in 0.0060 secs
  99% in 0.0073 secs

Details (average, fastest, slowest):
  DNS+dialup:   0.0003 secs, 0.0017 secs, 0.0084 secs
  DNS-lookup:   0.0000 secs, 0.0000 secs, 0.0000 secs
  req write:    0.0000 secs, 0.0000 secs, 0.0015 secs
  resp wait:    0.0033 secs, 0.0012 secs, 0.0050 secs
  resp read:    0.0000 secs, 0.0000 secs, 0.0020 secs

Status code distribution:
  [200] 5000 responses

dentarg commented 3 months ago

If I run Puma with more threads: WEB_CONCURRENCY=2 b e puma --threads 16:16 hello.ru

Requests/sec matches the numbers we get with -disable-keepalive and 1 thread per Puma worker

$ hey -n 5000 -c 100 http://127.0.0.1:9292

Summary:
  Total:    0.3199 secs
  Slowest:  0.3117 secs
  Fastest:  0.0000 secs
  Average:  0.0031 secs
  Requests/sec: 15631.2688

  Total data:   55000 bytes
  Size/request: 11 bytes

Response time histogram:
  0.000 [1] |
  0.031 [4911]  |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
  0.062 [10]    |
  0.094 [10]    |
  0.125 [11]    |
  0.156 [8] |
  0.187 [10]    |
  0.218 [9] |
  0.249 [15]    |
  0.281 [9] |
  0.312 [6] |

Latency distribution:
  10% in 0.0001 secs
  25% in 0.0001 secs
  50% in 0.0001 secs
  75% in 0.0001 secs
  90% in 0.0001 secs
  95% in 0.0001 secs
  99% in 0.1502 secs

Details (average, fastest, slowest):
  DNS+dialup:   0.0000 secs, 0.0000 secs, 0.3117 secs
  DNS-lookup:   0.0000 secs, 0.0000 secs, 0.0000 secs
  req write:    0.0000 secs, 0.0000 secs, 0.0009 secs
  resp wait:    0.0030 secs, 0.0000 secs, 0.3103 secs
  resp read:    0.0000 secs, 0.0000 secs, 0.0002 secs

Status code distribution:
  [200] 5000 responses

amitsaxena commented 3 months ago

I think you're just experiencing the nature of keepalive?

@dentarg I am not sure if I understand. Can you please elaborate. 🙏

Can we trust ab? https://httpd.apache.org/docs/2.4/programs/ab.html states under "Bugs"

The apache bench tests were only used to reproduce the problem locally. Our AWS containers were benchmarked using https://loader.io/ . So it's not necessarily dependent on apache bench I think.

Try more Puma threads and/or workers?

Not sure if that's a solution. To me it sounds like throwing money at the problem. What we were doing here is benchmarking similar hardware on Heroku against AWS and are seeing exponentially worse latencies in case of AWS on identical stacks and same puma configuration. So tuning puma threads and workers isn't the objective here, but the stark difference in latencies between identical hardware and config is a matter of concern. So effectively we are getting very poor price to performance ratio on AWS due to this issue.

dentarg commented 3 months ago

But you have completely different software in front of Puma with Heroku vs AWS?

amitsaxena commented 3 months ago

You mean the app which is being benchmarked? It's the same app, with exactly same puma config and environment variables.

If you meant the load testing tool, then they were both benchmarked using loader.io and also using apache bench and the problem is easily reproducible on each run.

amitsaxena commented 3 months ago

Each set of test was conducted using identical tooling and on exactly the same app. The only variable is Heroku vs AWS here.

amitsaxena commented 3 months ago

And the keep-alive issue was identified after recording all the traffic on AWS using tcpdump and looking at request headers for different kinds of traffic. To zero in on the issue we started hitting the Fargate container directly from within the AWS VPC, via the load balancer, using the public IP, etc. and eventually found this difference. When coming via load balancer we were seeing high latencies (due to keep-alive) compared to when hitting the container IP directly using apache bench. After this we were easily able to reproduce the issue by directly hitting the public IP of the Fargate container with and without the keep alive option using apache bench.

Another data point is when looking at number of connections in cat /proc/net/tcp during load tests, I found 312 entries in case of high latency and >13K entries for a low latency test. Not sure if it helps, but throwing it in here in case it gives some indications.

Also at no point in time during the tests we were hitting 100% CPU or memory, so resources isn't a problem here.

dentarg commented 3 months ago

If you aren't maxing out resources, you have room for more Puma threads? :)

I can understand that you expect similar performance or better with keep-alive. It makes sense. Someone needs to check why that isn't happening with Puma and one thread. I'll may look if/when I have some time.

The reason why you don't have the problem with Heroku looks be that keep-alive isn't used between the Heroku router and your app (Puma): https://devcenter.heroku.com/articles/http-routing#http-versions-supported

amitsaxena commented 3 months ago

If you aren't maxing out resources, you have room for more Puma threads? :)

Haha! I hear you :) The benchmarks were conducted on a simple health check which just returns the current timestamp to not introduce variables like database, cache, application code, etc. So not using 100% CPU and memory in benchmarks doesn't necessarily mean that we can increase the number of threads. Actual production traffic memory consumption looks different. We can go with bigger containers to solve it temporarily, but the base problem stays IMHO even on increasing number of threads IMHO.

Someone needs to check why that isn't happening with Puma and one thread

I don't think it's a one thread problem only. I can reproduce it with 2 and 4 threads on the sample as well. Please look at apache bench results below. I think what we are doing here is generating a much larger amount of parallelization, and that reduces the average latency. If we go too high on the resource pool (puma threads), the massive parallelization will reduce average latency to comparable levels, but that doesn't necessarily mean that the problem disappears on high thread count. If I go higher on the concurrency with the increased number of threads, I'll easily reproduce the problem again.

2 threads without keep-alive:

$ ab -n 5000 -c 100 http://0.0.0.0:9292/
This is ApacheBench, Version 2.3 <$Revision: 1913912 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking 0.0.0.0 (be patient)
Completed 500 requests
Completed 1000 requests
Completed 1500 requests
Completed 2000 requests
Completed 2500 requests
Completed 3000 requests
Completed 3500 requests
Completed 4000 requests
Completed 4500 requests
Completed 5000 requests
Finished 5000 requests

Server Software:        
Server Hostname:        0.0.0.0
Server Port:            9292

Document Path:          /
Document Length:        11 bytes

Concurrency Level:      100
Time taken for tests:   0.199 seconds
Complete requests:      5000
Failed requests:        0
Total transferred:      380000 bytes
HTML transferred:       55000 bytes
Requests per second:    25151.28 [#/sec] (mean)
Time per request:       3.976 [ms] (mean)
Time per request:       0.040 [ms] (mean, across all concurrent requests)
Transfer rate:          1866.70 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.8      0       4
Processing:     1    4   0.5      4       9
Waiting:        1    4   0.5      4       9
Total:          2    4   1.0      4      12

Percentage of the requests served within a certain time (ms)
  50%      4
  66%      4
  75%      4
  80%      4
  90%      4
  95%      7
  98%      7
  99%      8
 100%     12 (longest request)

2 thread with keep-alive:

$ ab -k -n 5000 -c 100 http://0.0.0.0:9292/
This is ApacheBench, Version 2.3 <$Revision: 1913912 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking 0.0.0.0 (be patient)
Completed 500 requests
Completed 1000 requests
Completed 1500 requests
Completed 2000 requests
Completed 2500 requests
Completed 3000 requests
Completed 3500 requests
Completed 4000 requests
Completed 4500 requests
Completed 5000 requests
Finished 5000 requests

Server Software:        
Server Hostname:        0.0.0.0
Server Port:            9292

Document Path:          /
Document Length:        11 bytes

Concurrency Level:      100
Time taken for tests:   5.176 seconds
Complete requests:      5000
Failed requests:        0
Keep-Alive requests:    4910
Total transferred:      497840 bytes
HTML transferred:       55000 bytes
Requests per second:    966.06 [#/sec] (mean)
Time per request:       103.514 [ms] (mean)
Time per request:       1.035 [ms] (mean, across all concurrent requests)
Transfer rate:          93.93 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.5      0       7
Processing:     0   53 421.6      0    5066
Waiting:        0   53 421.6      0    5066
Total:          0   53 421.6      0    5066

Percentage of the requests served within a certain time (ms)
  50%      0
  66%      0
  75%      0
  80%      0
  90%      0
  95%      0
  98%    102
  99%   2612
 100%   5066 (longest request)

> 25 times latency with keep-alive

4 threads without keep-alive:

$ ab -n 5000 -c 100 http://0.0.0.0:9292/   
This is ApacheBench, Version 2.3 <$Revision: 1913912 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking 0.0.0.0 (be patient)
Completed 500 requests
Completed 1000 requests
Completed 1500 requests
Completed 2000 requests
Completed 2500 requests
Completed 3000 requests
Completed 3500 requests
Completed 4000 requests
Completed 4500 requests
Completed 5000 requests
Finished 5000 requests

Server Software:        
Server Hostname:        0.0.0.0
Server Port:            9292

Document Path:          /
Document Length:        11 bytes

Concurrency Level:      100
Time taken for tests:   0.203 seconds
Complete requests:      5000
Failed requests:        0
Total transferred:      380000 bytes
HTML transferred:       55000 bytes
Requests per second:    24622.78 [#/sec] (mean)
Time per request:       4.061 [ms] (mean)
Time per request:       0.041 [ms] (mean, across all concurrent requests)
Transfer rate:          1827.47 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.8      0       5
Processing:     2    4   0.5      4       7
Waiting:        1    4   0.5      4       7
Total:          3    4   1.0      4      11

Percentage of the requests served within a certain time (ms)
  50%      4
  66%      4
  75%      4
  80%      4
  90%      4
  95%      6
  98%      7
  99%      9
 100%     11 (longest request)

4 threads with keep-alive:

$ ab -k -n 5000 -c 100 http://0.0.0.0:9292/
This is ApacheBench, Version 2.3 <$Revision: 1913912 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking 0.0.0.0 (be patient)
Completed 500 requests
Completed 1000 requests
Completed 1500 requests
Completed 2000 requests
Completed 2500 requests
Completed 3000 requests
Completed 3500 requests
Completed 4000 requests
Completed 4500 requests
Completed 5000 requests
Finished 5000 requests

Server Software:        
Server Hostname:        0.0.0.0
Server Port:            9292

Document Path:          /
Document Length:        11 bytes

Concurrency Level:      100
Time taken for tests:   2.617 seconds
Complete requests:      5000
Failed requests:        0
Keep-Alive requests:    4909
Total transferred:      497816 bytes
HTML transferred:       55000 bytes
Requests per second:    1910.25 [#/sec] (mean)
Time per request:       52.349 [ms] (mean)
Time per request:       0.523 [ms] (mean, across all concurrent requests)
Transfer rate:          185.73 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.5      0       4
Processing:     0   27 207.4      0    2525
Waiting:        0   27 207.4      0    2525
Total:          0   27 207.4      0    2525

Percentage of the requests served within a certain time (ms)
  50%      0
  66%      0
  75%      0
  80%      0
  90%      0
  95%      0
  98%     81
  99%   1329
 100%   2525 (longest request)

> 13 times latency with keep-alive

amitsaxena commented 3 months ago

Results with 16 puma thread and 2 workers. The difference in latencies is still very clearly visible (4ms vs 14ms) IMHO. In the tests below you can also clearly see increase in latency as I keep going up on concurrency. I couldn't go beyond 140 as apache bench crashes then.

$ ab -n 5000 -c 100 http://0.0.0.0:9292/   
This is ApacheBench, Version 2.3 <$Revision: 1913912 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking 0.0.0.0 (be patient)
Completed 500 requests
Completed 1000 requests
Completed 1500 requests
Completed 2000 requests
Completed 2500 requests
Completed 3000 requests
Completed 3500 requests
Completed 4000 requests
Completed 4500 requests
Completed 5000 requests
Finished 5000 requests

Server Software:        
Server Hostname:        0.0.0.0
Server Port:            9292

Document Path:          /
Document Length:        11 bytes

Concurrency Level:      100
Time taken for tests:   0.202 seconds
Complete requests:      5000
Failed requests:        0
Total transferred:      380000 bytes
HTML transferred:       55000 bytes
Requests per second:    24722.98 [#/sec] (mean)
Time per request:       4.045 [ms] (mean)
Time per request:       0.040 [ms] (mean, across all concurrent requests)
Transfer rate:          1834.91 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.8      0       5
Processing:     3    4   0.5      4       7
Waiting:        1    4   0.5      4       7
Total:          3    4   1.1      4      10

Percentage of the requests served within a certain time (ms)
  50%      4
  66%      4
  75%      4
  80%      4
  90%      5
  95%      7
  98%      7
  99%      9
 100%     10 (longest request)

$ ab -k -n 5000 -c 100 http://0.0.0.0:9292/
This is ApacheBench, Version 2.3 <$Revision: 1913912 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking 0.0.0.0 (be patient)
Completed 500 requests
Completed 1000 requests
Completed 1500 requests
Completed 2000 requests
Completed 2500 requests
Completed 3000 requests
Completed 3500 requests
Completed 4000 requests
Completed 4500 requests
Completed 5000 requests
Finished 5000 requests

Server Software:        
Server Hostname:        0.0.0.0
Server Port:            9292

Document Path:          /
Document Length:        11 bytes

Concurrency Level:      100
Time taken for tests:   0.720 seconds
Complete requests:      5000
Failed requests:        0
Keep-Alive requests:    4903
Total transferred:      497672 bytes
HTML transferred:       55000 bytes
Requests per second:    6944.43 [#/sec] (mean)
Time per request:       14.400 [ms] (mean)
Time per request:       0.144 [ms] (mean, across all concurrent requests)
Transfer rate:          675.01 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.3      0       4
Processing:     0    7  47.8      0     629
Waiting:        0    7  47.8      0     629
Total:          0    7  47.8      0     629

Percentage of the requests served within a certain time (ms)
  50%      0
  66%      0
  75%      0
  80%      1
  90%      1
  95%      1
  98%     68
  99%    298
 100%    629 (longest request)

Higher concurrency tests:

$ ab -k -n 5000 -c 110 http://0.0.0.0:9292/
This is ApacheBench, Version 2.3 <$Revision: 1913912 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking 0.0.0.0 (be patient)
Completed 500 requests
Completed 1000 requests
Completed 1500 requests
Completed 2000 requests
Completed 2500 requests
Completed 3000 requests
Completed 3500 requests
Completed 4000 requests
Completed 4500 requests
Completed 5000 requests
Finished 5000 requests

Server Software:        
Server Hostname:        0.0.0.0
Server Port:            9292

Document Path:          /
Document Length:        11 bytes

Concurrency Level:      110
Time taken for tests:   0.754 seconds
Complete requests:      5000
Failed requests:        0
Keep-Alive requests:    4896
Total transferred:      497504 bytes
HTML transferred:       55000 bytes
Requests per second:    6630.44 [#/sec] (mean)
Time per request:       16.590 [ms] (mean)
Time per request:       0.151 [ms] (mean, across all concurrent requests)
Transfer rate:          644.27 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.3      0       3
Processing:     0    8  56.6      0     657
Waiting:        0    8  56.6      0     657
Total:          0    9  56.6      0     657

Percentage of the requests served within a certain time (ms)
  50%      0
  66%      0
  75%      1
  80%      1
  90%      1
  95%      1
  98%     86
  99%    327
 100%    657 (longest request)

$ ab -k -n 5000 -c 120 http://0.0.0.0:9292/
This is ApacheBench, Version 2.3 <$Revision: 1913912 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking 0.0.0.0 (be patient)
Completed 500 requests
Completed 1000 requests
Completed 1500 requests
Completed 2000 requests
Completed 2500 requests
Completed 3000 requests
Completed 3500 requests
Completed 4000 requests
Completed 4500 requests
Completed 5000 requests
Finished 5000 requests

Server Software:        
Server Hostname:        0.0.0.0
Server Port:            9292

Document Path:          /
Document Length:        11 bytes

Concurrency Level:      120
Time taken for tests:   0.787 seconds
Complete requests:      5000
Failed requests:        0
Keep-Alive requests:    4860
Total transferred:      496640 bytes
HTML transferred:       55000 bytes
Requests per second:    6357.27 [#/sec] (mean)
Time per request:       18.876 [ms] (mean)
Time per request:       0.157 [ms] (mean, across all concurrent requests)
Transfer rate:          616.66 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.5      0       5
Processing:     0   10  64.5      0     691
Waiting:        0   10  64.5      0     691
Total:          0   10  64.5      0     691

Percentage of the requests served within a certain time (ms)
  50%      0
  66%      0
  75%      1
  80%      1
  90%      1
  95%      7
  98%     95
  99%    460
 100%    691 (longest request)

$ ab -k -n 5000 -c 130 http://0.0.0.0:9292/
This is ApacheBench, Version 2.3 <$Revision: 1913912 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking 0.0.0.0 (be patient)
Completed 500 requests
Completed 1000 requests
Completed 1500 requests
Completed 2000 requests
Completed 2500 requests
Completed 3000 requests
Completed 3500 requests
Completed 4000 requests
Completed 4500 requests
Completed 5000 requests
Finished 5000 requests

Server Software:        
Server Hostname:        0.0.0.0
Server Port:            9292

Document Path:          /
Document Length:        11 bytes

Concurrency Level:      130
Time taken for tests:   0.939 seconds
Complete requests:      5000
Failed requests:        0
Keep-Alive requests:    4909
Total transferred:      497816 bytes
HTML transferred:       55000 bytes
Requests per second:    5325.01 [#/sec] (mean)
Time per request:       24.413 [ms] (mean)
Time per request:       0.188 [ms] (mean, across all concurrent requests)
Transfer rate:          517.75 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.5      0       4
Processing:     0   12  75.0      0     843
Waiting:        0   12  75.0      0     843
Total:          0   12  75.0      0     843

Percentage of the requests served within a certain time (ms)
  50%      0
  66%      0
  75%      1
  80%      1
  90%      1
  95%      1
  98%    152
  99%    508
 100%    843 (longest request)

$ ab -k -n 5000 -c 140 http://0.0.0.0:9292/
This is ApacheBench, Version 2.3 <$Revision: 1913912 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking 0.0.0.0 (be patient)
Completed 500 requests
Completed 1000 requests
Completed 1500 requests
Completed 2000 requests
Completed 2500 requests
apr_socket_recv: Connection reset by peer (54)
Total of 2660 requests completed

dentarg commented 3 months ago

If I add max_fast_inline 0 to the Puma config, hey -n 5000 -c 100 http://127.0.0.1:9292 gives me Requests/sec: 26195.1310 for 2 puma workers with 1 thread each.

$ hey -n 5000 -c 100 http://127.0.0.1:9292

Summary:
  Total:    0.1909 secs
  Slowest:  0.0107 secs
  Fastest:  0.0017 secs
  Average:  0.0038 secs
  Requests/sec: 26195.1310

  Total data:   55000 bytes
  Size/request: 11 bytes

Response time histogram:
  0.002 [1] |
  0.003 [25]    |
  0.004 [2389]  |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
  0.004 [2044]  |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
  0.005 [189]   |■■■
  0.006 [127]   |■■
  0.007 [99]    |■■
  0.008 [59]    |■
  0.009 [50]    |■
  0.010 [11]    |
  0.011 [6] |

Latency distribution:
  10% in 0.0031 secs
  25% in 0.0033 secs
  50% in 0.0035 secs
  75% in 0.0037 secs
  90% in 0.0045 secs
  95% in 0.0060 secs
  99% in 0.0083 secs

Details (average, fastest, slowest):
  DNS+dialup:   0.0001 secs, 0.0017 secs, 0.0107 secs
  DNS-lookup:   0.0000 secs, 0.0000 secs, 0.0000 secs
  req write:    0.0000 secs, 0.0000 secs, 0.0011 secs
  resp wait:    0.0035 secs, 0.0007 secs, 0.0068 secs
  resp read:    0.0000 secs, 0.0000 secs, 0.0020 secs

Status code distribution:
  [200] 5000 responses

amitsaxena commented 3 months ago

The reason why you don't have the problem with Heroku looks be that keep-alive isn't used between the Heroku router and your app (Puma): https://devcenter.heroku.com/articles/http-routing#http-versions-supported

Also thanks for this! 🙏 ⭐

This clearly validates my hypothesis as that's the only difference between the 2 environments - AWS and Heroku. ALB uses keep-alive whereas Heroku router doesn't.

The router will assume that the client wants to close the connection on each request (no keep-alive).

So theoretically speaking if Heroku started using keep-alive in their router as an enhancement, all apps using puma will perform exponentially worse compared to today :)

And thanks a lot for the engagement @dentarg and for debugging this with me! Really appreciate it! ❤️

ioquatix commented 3 months ago

When I checked this last (several years ago), Puma's implementation of keep-alive wasn't particularly fair. Basically, if you had 2 workers, and 3 connections (all keep-alive), one of the connections wouldn't get serviced.

You can see the behaviour today using a modified version of wrk, running with puma -t1 for demonstration purposes:

> ./wrk -c 1 -t 1 -d 1 http://localhost:9292/hello
Running 1s test @ http://localhost:9292/hello
  1 threads and 1 connections
closing connection after 40442 requests
connection 0: 40442 requests completed
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    25.60us   12.42us 404.00us   97.63%
    Req/Sec    37.02k    12.26k   41.54k    90.91%
  40442 requests in 1.10s, 1.97MB read
Requests/sec:  36753.96
Transfer/sec:      1.79MB

As soon as you do this with 2 connections, the keep-alive recycling logic kicks in:

> ./wrk -c 2 -t 1 -d 1 http://localhost:9292/hello
... snip ...
closing connection after 16720 requests
closing connection after 16730 requests
closing connection after 16730 requests
closing connection after 16740 requests
closing connection after 16740 requests
closing connection after 16750 requests
closing connection after 16750 requests
closing connection after 16760 requests
closing connection after 16760 requests
closing connection after 16770 requests
closing connection after 16770 requests
closing connection after 16780 requests
closing connection after 16780 requests
closing connection after 16790 requests
closing connection after 16790 requests
closing connection after 16800 requests
closing connection after 16800 requests
closing connection after 16810 requests
closing connection after 16810 requests
closing connection after 16820 requests
closing connection after 16820 requests
closing connection after 16830 requests
closing connection after 16830 requests
closing connection after 16840 requests
closing connection after 16840 requests
closing connection after 16850 requests
closing connection after 16850 requests
closing connection after 16851 requests
connection 0: 16851 requests completed
connection 1: 16850 requests completed
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    98.50us  116.35us   0.89ms   81.10%
    Req/Sec    30.85k    10.19k   35.31k    90.91%
  33701 requests in 1.10s, 1.70MB read
Requests/sec:  30619.65
Transfer/sec:      1.54MB

That means if you have more connections than worker threads, you will start running into issues. As you can see above, the latency wasn't just 2x worse (for 2x the contention) it was 10x worse, likely due to all the reconnecting going on.

ALB uses keep-alive whereas Heroku router doesn't.

If you are running Puma behind a load balancer, the load balancer should ideally be configured with one keep alive connection per worker thread. This is the absolute best arrangement and additional connections will not help and will in fact make things worse.

dentarg commented 3 months ago

If I add max_fast_inline 0 to the Puma config, hey -n 5000 -c 100 http://127.0.0.1:9292 gives me Requests/sec: 26195.1310 for 2 puma workers with 1 thread each.

queue_requests false in the config achieves the same (docs says it disables HTTP keep-alive)

dentarg commented 3 months ago

I think https://github.com/puma/puma/issues/2311 is the same issue as this

ioquatix commented 3 months ago

Not sure if this is helpful but there are really only two scenarios you should care about, considering the design of Puma:

1/ Browser facing server: every request should be queued and served in order.

2/ Load balancing facing server: every worker should have one connection to them load balancer.

Connections should not be closed in either case. In (1) you should be internally queueing things so closing the connection should not be necessary for "fast connection handling" and (2) should never overload the number of workers so should never need queueing (the LB should be taking care of that). Every other configuration will, IMHO, be worse for performance/latency/throughput.

dentarg commented 3 months ago

Using keep-alive we end up here

https://github.com/puma/puma/blob/6db7c28b069710419cc0a78dbc93c351b222a8e4/lib/puma/server.rb#L473-L499

In Client#reset there is this line

https://github.com/puma/puma/blob/6db7c28b069710419cc0a78dbc93c351b222a8e4/lib/puma/client.rb#L185

If I lower this value to 0.002, I get similar throughput as with -disable-keepalive

https://github.com/puma/puma/blob/6db7c28b069710419cc0a78dbc93c351b222a8e4/lib/puma/const.rb#L108

hey -n 5000 -c 100 http://127.0.0.1:9292

``` $ echo 'app { [200, {}, ["OK"]] }' | PUMA_LOG_CONFIG=1 ruby -Ilib bin/puma -w 2 -t 1:1 --config /dev/stdin ... ``` ``` $ hey -n 5000 -c 100 http://127.0.0.1:9292 Summary: Total: 0.1763 secs Slowest: 0.0394 secs Fastest: 0.0000 secs Average: 0.0026 secs Requests/sec: 28364.1608 Total data: 10000 bytes Size/request: 2 bytes Response time histogram: 0.000 [1] | 0.004 [4504] |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■ 0.008 [11] | 0.012 [8] | 0.016 [7] | 0.020 [13] | 0.024 [62] |■ 0.028 [249] |■■ 0.032 [88] |■ 0.035 [37] | 0.039 [20] | Latency distribution: 10% in 0.0000 secs 25% in 0.0000 secs 50% in 0.0000 secs 75% in 0.0001 secs 90% in 0.0027 secs 95% in 0.0255 secs 99% in 0.0323 secs Details (average, fastest, slowest): DNS+dialup: 0.0000 secs, 0.0000 secs, 0.0394 secs DNS-lookup: 0.0000 secs, 0.0000 secs, 0.0000 secs req write: 0.0000 secs, 0.0000 secs, 0.0006 secs resp wait: 0.0025 secs, 0.0000 secs, 0.0369 secs resp read: 0.0000 secs, 0.0000 secs, 0.0003 secs Status code distribution: [200] 5000 responses ```

dentarg commented 3 months ago

FAST_TRACK_KA_TIMEOUT was introduced in b9388a6074a816660be0d6dbb49ee802f13507d5

amitsaxena commented 3 months ago

If you are running Puma behind a load balancer, the load balancer should ideally be configured with one keep alive connection per worker thread. This is the absolute best arrangement and additional connections will not help and will in fact make things worse.

Is that even possible to do in AWS environment with application load balancer? I couldn't figure out any setting around number of keep-alive connections.

should never overload the number of workers so should never need queueing (the LB should be taking care of that)

Also not sure about this in AWS context. The ALB will just forward the requests downstream based on the routing algorithm. If queuing is disabled on Puma then the requests reaching the containers will most likely error out?

I think https://github.com/puma/puma/issues/2311 is the same issue as this

Not sure. That one specifically talks about single threaded mode, but we can see in the tests that this doesn't necessarily affect only single thread configurations.

@dentarg I'll investigate your comments around tweaking settings later in the day and get back to you after conducting some tests.

dentarg commented 3 months ago

You can see the behaviour today using a modified version of wrk

@ioquatix Is this is https://github.com/ioquatix/wrk or have you more modifications that makes it print closing connection after ...? I built your fork and it doesn't print it for me

ioquatix commented 3 months ago

Ah, sorry, I have a habit of just modifying the wrk code to add extra output. Let me push a branch with that particular log.

Okay, I added the log: in this branch wrk (connection-close-log). LMK if you have trouble getting the same results.

If this is a useful feature maybe we can capture that and report the number of connection close in the final statistics.

MehdiZonjy commented 3 months ago

Is that even possible to do in AWS environment with application load balancer? I couldn't figure out any setting around number of keep-alive connections.

@amitsaxena As far as I know, ALB doesn't have an interface to configure for the number of keep-alive connections.

I want to share a few ideas to try to tune the AWS infra further:

tune alb and webserver max timeouts

I'm not sure if Puma allows tuning the max keep-alive timeout. ALB has a (client_keep_alive)[https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/lb#client_keep_alive], and it might make sense to use a similar value (or two close values) for both Puma and ALB.

experiment with routing algorithms

Try using least_outstanding_requests

Enable ALB Access Logs

See if you could enable ALB access logs. I'm interested in the various request|target|response_processing_time values. They might show us what exactly is causing the additional latency

amitsaxena commented 3 months ago

I tried different configurations and below are my observations:

max_fast_inline 0 result in slightly higher latencies than without keep-alive for me but mostly works fine.

The code says this about the setting:

The number of requests to attempt inline before sending a client back to the reactor to be subject to normal ordering.

I don't understand well enough what the downsides here can be if we set it to 0.

queue_requests false works fine for a few short tests, but if I continue running tests for longer duration it either crashes the test or gives me much higher latencies than before. Please see some examples below.

Load tests with higher latencies and crashes(expandable)

``` $ WEB_CONCURRENCY=2 RAILS_MAX_THREADS=1 PORT=9292 RAILS_ENV=production bundle exec puma -C puma.rb hello.ru ``` ``` $ ab -k -n 5000 -c 100 http://0.0.0.0:9292/ This is ApacheBench, Version 2.3 <$Revision: 1913912 $> Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/ Licensed to The Apache Software Foundation, http://www.apache.org/ Benchmarking 0.0.0.0 (be patient) Completed 500 requests Completed 1000 requests Completed 1500 requests Completed 2000 requests Completed 2500 requests Completed 3000 requests Completed 3500 requests Completed 4000 requests Completed 4500 requests Completed 5000 requests Finished 5000 requests Server Software: Server Hostname: 0.0.0.0 Server Port: 9292 Document Path: / Document Length: 11 bytes Concurrency Level: 100 Time taken for tests: 0.229 seconds Complete requests: 5000 Failed requests: 0 Keep-Alive requests: 0 Total transferred: 380000 bytes HTML transferred: 55000 bytes Requests per second: 21857.92 [#/sec] (mean) Time per request: 4.575 [ms] (mean) Time per request: 0.046 [ms] (mean, across all concurrent requests) Transfer rate: 1622.27 [Kbytes/sec] received Connection Times (ms) min mean[+/-sd] median max Connect: 0 0 0.5 0 6 Processing: 1 4 0.6 3 8 Waiting: 1 3 0.6 3 8 Total: 2 4 0.8 4 9 WARNING: The median and mean for the processing time are not within a normal deviation These results are probably not that reliable. Percentage of the requests served within a certain time (ms) 50% 4 66% 4 75% 4 80% 4 90% 4 95% 4 98% 7 99% 8 100% 9 (longest request) ``` ``` $ ab -k -n 50000 -c 100 http://0.0.0.0:9292/ This is ApacheBench, Version 2.3 <$Revision: 1913912 $> Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/ Licensed to The Apache Software Foundation, http://www.apache.org/ Benchmarking 0.0.0.0 (be patient) Completed 5000 requests Completed 10000 requests Completed 15000 requests apr_socket_recv: Operation timed out (60) Total of 16422 requests completed ```

My understanding of this setting was that no queuing will happen on puma if this is disabled. Does anyone know what happens if all puma threads are occupied and a request reaches puma? Will it error out as it cannot be queued? My guess is that it's a possible scenario as load balancer will keep sending requests to puma based on routing algorithm and it doesn't really know that puma cannot queue and when to stop sending requests and buffer them at load balancer layer. Or are there other nuances involved here?

If I lower this value to 0.002, I get similar throughput as with -disable-keepalive

Interesting observation! I don't know what the potential downsides here can be of setting this value too low.

amitsaxena commented 3 months ago

Try using least_outstanding_requests

@MehdiZonjy That's what we used during the load tests.

See if you could enable ALB access logs. I'm interested in the various request|target|response_processing_time values. They might show us what exactly is causing the additional latency

We did look at them and the bulk of the time was being spent on the Fargate instance:

request_processing_time is close to 0 seconds. It is time for the load balancer to receive the request and forward it to the target.
The average target_processing_time was greater than 4 seconds. It is the time taken by the task to fully process the request.
The average response_processing_time is close to 0 seconds. It is the time for the load balancer to get the response from the target and relay it back to the client.

So we are clearly able to establish that the bulk of the time is being spent on the Fargate container. We also used DataDog to record application performance metrics and both the logs and APM data clearly shows that the average time spent processing the request is in the 2ms range which matches with what we get without keep-alive. So all the above makes us believe that the bulk of time here is spent queuing in puma before the request gets serviced, and the scenario is easily reproducible with keep-alive when hitting the Fargate container directly and skipping the ALB.

So IMHO, all Ruby apps on AWS using the application load balancer are performing much worse than what they are capable of because of this bug. If you are running a Ruby app on AWS with a similar setup it can be worth benchmarking your containers with and without keep-alive to see the difference and share the data with us to validate this finding even further. 🙏

tune alb and webserver max timeouts

I am not certain if this is going to help since the problem here is that puma with keep-alive isn't performant enough with high concurrency, so timeouts don't matter much I think.

dentarg commented 2 months ago

@schneems please post your write-up from Slack here

dentarg commented 2 months ago

@schneems please post your write-up from Slack here

It was posted at https://github.com/puma/puma/issues/3487#issuecomment-2356401488

dentarg commented 1 month ago

@amitsaxena are you able to try out https://github.com/puma/puma/pull/3506?

amitsaxena commented 1 month ago

Sorry for the delay @dentarg. It has been quite busy at work and on the personal front recently.

I can confirm that when re-running the old benchmarks on this puma branch, I am seeing significantly better (~30%) latencies with keep-alive enabled locally. Below are the results.

Without keep-alive:

``` ab -n 5000 -c 100 http://0.0.0.0:9292/ This is ApacheBench, Version 2.3 <$Revision: 1913912 $> Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/ Licensed to The Apache Software Foundation, http://www.apache.org/ Benchmarking 0.0.0.0 (be patient) Completed 500 requests Completed 1000 requests Completed 1500 requests Completed 2000 requests Completed 2500 requests Completed 3000 requests Completed 3500 requests Completed 4000 requests Completed 4500 requests Completed 5000 requests Finished 5000 requests Server Software: Server Hostname: 0.0.0.0 Server Port: 9292 Document Path: / Document Length: 11 bytes Concurrency Level: 100 Time taken for tests: 0.210 seconds Complete requests: 5000 Failed requests: 0 Total transferred: 380000 bytes HTML transferred: 55000 bytes Requests per second: 23833.70 [#/sec] (mean) Time per request: 4.196 [ms] (mean) Time per request: 0.042 [ms] (mean, across all concurrent requests) Transfer rate: 1768.91 [Kbytes/sec] received Connection Times (ms) min mean[+/-sd] median max Connect: 0 0 1.0 0 4 Processing: 1 4 0.8 4 10 Waiting: 1 4 0.8 4 10 Total: 3 4 1.2 4 10 Percentage of the requests served within a certain time (ms) 50% 4 66% 4 75% 4 80% 4 90% 5 95% 8 98% 8 99% 9 100% 10 (longest request) ```

With keep-alive:

``` $ ab -k -n 5000 -c 100 http://0.0.0.0:9292/ This is ApacheBench, Version 2.3 <$Revision: 1913912 $> Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/ Licensed to The Apache Software Foundation, http://www.apache.org/ Benchmarking 0.0.0.0 (be patient) Completed 500 requests Completed 1000 requests Completed 1500 requests Completed 2000 requests Completed 2500 requests Completed 3000 requests Completed 3500 requests Completed 4000 requests Completed 4500 requests Completed 5000 requests Finished 5000 requests Server Software: Server Hostname: 0.0.0.0 Server Port: 9292 Document Path: / Document Length: 11 bytes Concurrency Level: 100 Time taken for tests: 0.137 seconds Complete requests: 5000 Failed requests: 0 Keep-Alive requests: 4514 Total transferred: 488336 bytes HTML transferred: 55000 bytes Requests per second: 36425.36 [#/sec] (mean) Time per request: 2.745 [ms] (mean) Time per request: 0.027 [ms] (mean, across all concurrent requests) Transfer rate: 3474.18 [Kbytes/sec] received Connection Times (ms) min mean[+/-sd] median max Connect: 0 0 0.5 0 5 Processing: 0 3 6.1 0 32 Waiting: 0 3 6.1 0 32 Total: 0 3 6.2 0 33 Percentage of the requests served within a certain time (ms) 50% 0 66% 0 75% 1 80% 2 90% 9 95% 21 98% 23 99% 24 100% 33 (longest request) ```

I can possibly do more comprehensive load tests later on AWS once a newer version of puma with this fix is out as it will be easier to get it past CI/CD pipelines and into our ECS+Fargate setup.