Super slowly Time.parse

bragovo commented 1 year ago

There is simple app:

class App
  def call(env)
    1000.times do
      Time.parse("2022-10-24 12:33:44.054237")
    end
    [200, {}, ["App"]]
  end
end

run App.new

Response time: 450 ms - falcon 15 ms - puma 15 ms - rackup

If change to DateTime.parse("2022-10-24 12:33:44.054237"): 13.3 ms - falcon 11.5 ms - puma 11 ms - rackup

How I can fix it?

falcon (0.42.3) async-http (0.59.3) async (2.2.1) rack (2.2.4) ruby 3.1.2p20

trevorturk commented 1 year ago

I ran into an issue that I suspected might be related to this, where I was parsing a lot of times and adjusting them using Active Support's beginning_of_day etc. I just ran a simple test and wasn't able to reproduce the Time/Datetime difference here. I wonder if you could add more detail if you're able to reproduce. (I believe my issue was just inefficient code, so I'm working on adjusting to avoid parsing so many dates/times.)

I'm running Ruby 3.1.2 on an M1 Pro.

$ ruby -v
ruby 3.1.2p20 (2022-04-12 revision 4491bb740a) [arm64-darwin21]

With Time.parse:

~/Downloads $ ab -n 100 -c 10 http://localhost:9292/
Time taken for tests:   18.707 seconds
Requests per second:    5.35 [#/sec] (mean)
Time per request:       1870.696 [ms] (mean)

With Datetime.parse:

~/Downloads $ ab -n 100 -c 10 http://localhost:9292/
Time taken for tests:   18.688 seconds
Requests per second:    5.35 [#/sec] (mean)
Time per request:       1868.780 [ms] (mean)

bragovo commented 1 year ago

Example repo https://github.com/bragovo/falcon-time-example with results. I tried this code on my old Intel iMac (ruby 3.1.2p20 (2022-04-12 revision 4491bb740a) [x86_64-darwin20]) and there are same results :-D

ioquatix commented 1 year ago

Can you try with and without the TZ environment variable set?

ioquatix commented 1 year ago

Did you notice that when hosting it in Puma, it fails with:

2023-02-22 20:26:34 +1300 Rack app ("GET /" - (127.0.0.1)): #<NoMethodError: undefined method `parse' for Time:Class>
2023-02-22 20:26:34 +1300 Rack app ("GET /" - (127.0.0.1)): #<NoMethodError: undefined method `parse' for Time:Class>
2023-02-22 20:26:34 +1300 Rack app ("GET /" - (127.0.0.1)): #<NoMethodError: undefined method `parse' for Time:Class>
2023-02-22 20:26:34 +1300 Rack app ("GET /" - (127.0.0.1)): #<NoMethodError: undefined method `parse' for Time:Class>
2023-02-22 20:26:34 +1300 Rack app ("GET /" - (127.0.0.1)): #<NoMethodError: undefined method `parse' for Time:Class>
2023-02-22 20:26:34 +1300 Rack app ("GET /" - (127.0.0.1)): #<NoMethodError: undefined method `parse' for Time:Class>
2023-02-22 20:26:34 +1300 Rack app ("GET /" - (127.0.0.1)): #<NoMethodError: undefined method `parse' for Time:Class>
2023-02-22 20:26:34 +1300 Rack app ("GET /" - (127.0.0.1)): #<NoMethodError: undefined method `parse' for Time:Class>

which is much faster than Time.parse. anyway, I fixed that and will compare it.

ioquatix commented 1 year ago

Puma running with a single thread

> puma -b tcp://localhost:9292 -t1 config.ru
Puma starting in single mode...
* Puma version: 6.0.2 (ruby 3.2.1-p31) ("Sunflower")
*  Min threads: 1
*  Max threads: 1
*  Environment: development
*          PID: 668887
* Listening on http://127.0.0.1:9292
* Listening on http://[::1]:9292
Use Ctrl-C to stop

Results

> benchmark-http hammer -k 1 -c 1000 http://localhost:9292
  0.0s     info: Benchmark::HTTP::Command::Hammer [oid=0x5dc] [ec=0x5f0] [pid=668898] [2023-02-22 20:28:10 +1300]
               | I am going to benchmark http://localhost:9292...
  0.0s     info: Benchmark::HTTP::Command::Hammer [oid=0x5dc] [ec=0x604] [pid=668898] [2023-02-22 20:28:10 +1300]
               | I am running 1 asynchronous tasks that will each make 1000 sequential requests...
  1.0s     info: Benchmark::HTTP::Command::Hammer [oid=0x5dc] [ec=0x668] [pid=668898] [2023-02-22 20:28:11 +1300]
               | 200 samples. 200.94418471793307 requests per second. S/D: 320.440µs.
  2.0s     info: Benchmark::HTTP::Command::Hammer [oid=0x5dc] [ec=0x668] [pid=668898] [2023-02-22 20:28:12 +1300]
               | 401 samples. 200.85522376109932 requests per second. S/D: 240.636µs.
  3.0s     info: Benchmark::HTTP::Command::Hammer [oid=0x5dc] [ec=0x668] [pid=668898] [2023-02-22 20:28:13 +1300]
               | 603 samples. 201.13179231385223 requests per second. S/D: 221.814µs.
  4.0s     info: Benchmark::HTTP::Command::Hammer [oid=0x5dc] [ec=0x668] [pid=668898] [2023-02-22 20:28:14 +1300]
               | 803 samples. 200.84746037646306 requests per second. S/D: 219.104µs.
 4.98s     info: Benchmark::HTTP::Command::Hammer [oid=0x5dc] [ec=0x604] [pid=668898] [2023-02-22 20:28:15 +1300]
               | I made 1000 requests in 5.0s. The per-request latency was 4.98ms. That's 200.8767989549459 asynchronous requests/second.
               |              Variance: 0.046µs
               |    Standard Deviation: 213.545µs
               |        Standard Error: 6.753µs
               | 1000 samples. 200.8767989549459 requests per second. S/D: 213.545µs.

> benchmark-http hammer -k 10 -c 100 http://localhost:9292
  0.0s     info: Benchmark::HTTP::Command::Hammer [oid=0x5dc] [ec=0x5f0] [pid=669014] [2023-02-22 20:29:16 +1300]
               | I am going to benchmark http://localhost:9292...
  0.0s     info: Benchmark::HTTP::Command::Hammer [oid=0x5dc] [ec=0x604] [pid=669014] [2023-02-22 20:29:16 +1300]
               | I am running 10 asynchronous tasks that will each make 100 sequential requests...
  1.0s     info: Benchmark::HTTP::Command::Hammer [oid=0x5dc] [ec=0xd34] [pid=669014] [2023-02-22 20:29:17 +1300]
               | 200 samples. 258.51562912713604 requests per second. S/D: 111.69ms.
  2.0s     info: Benchmark::HTTP::Command::Hammer [oid=0x5dc] [ec=0xd34] [pid=669014] [2023-02-22 20:29:18 +1300]
               | 402 samples. 222.12917880935797 requests per second. S/D: 124.67ms.
  3.0s     info: Benchmark::HTTP::Command::Hammer [oid=0x5dc] [ec=0xd34] [pid=669014] [2023-02-22 20:29:19 +1300]
               | 603 samples. 214.88203164349665 requests per second. S/D: 127.91ms.
  4.0s     info: Benchmark::HTTP::Command::Hammer [oid=0x5dc] [ec=0xd34] [pid=669014] [2023-02-22 20:29:20 +1300]
               | 803 samples. 211.01083843421318 requests per second. S/D: 129.68ms.
 4.98s     info: Benchmark::HTTP::Command::Hammer [oid=0x5dc] [ec=0x604] [pid=669014] [2023-02-22 20:29:21 +1300]
               | I made 1000 requests in 4.8s. The per-request latency was 47.57ms. That's 210.23564512237164 asynchronous requests/second.
               |              Variance: 16.96ms
               |    Standard Deviation: 130.24ms
               |        Standard Error: 4.12ms
               | 1000 samples. 210.23564512237164 requests per second. S/D: 130.24ms.

Puma with 16 threads

> puma -b tcp://localhost:9292 -t16 config.ru
Puma starting in single mode...
* Puma version: 6.0.2 (ruby 3.2.1-p31) ("Sunflower")
*  Min threads: 16
*  Max threads: 16
*  Environment: development
*          PID: 669155
* Listening on http://127.0.0.1:9292
* Listening on http://[::1]:9292
Use Ctrl-C to stop

Results

> benchmark-http hammer -k 1 -c 1000 http://localhost:9292
  0.0s     info: Benchmark::HTTP::Command::Hammer [oid=0x5dc] [ec=0x5f0] [pid=669199] [2023-02-22 20:30:33 +1300]
               | I am going to benchmark http://localhost:9292...
  0.0s     info: Benchmark::HTTP::Command::Hammer [oid=0x5dc] [ec=0x604] [pid=669199] [2023-02-22 20:30:33 +1300]
               | I am running 1 asynchronous tasks that will each make 1000 sequential requests...
  1.0s     info: Benchmark::HTTP::Command::Hammer [oid=0x5dc] [ec=0x668] [pid=669199] [2023-02-22 20:30:34 +1300]
               | 194 samples. 194.86982203688927 requests per second. S/D: 814.693µs.
  2.0s     info: Benchmark::HTTP::Command::Hammer [oid=0x5dc] [ec=0x668] [pid=669199] [2023-02-22 20:30:35 +1300]
               | 394 samples. 197.3845441478922 requests per second. S/D: 580.816µs.
  3.0s     info: Benchmark::HTTP::Command::Hammer [oid=0x5dc] [ec=0x668] [pid=669199] [2023-02-22 20:30:36 +1300]
               | 595 samples. 198.42646538851292 requests per second. S/D: 485.411µs.
  4.0s     info: Benchmark::HTTP::Command::Hammer [oid=0x5dc] [ec=0x668] [pid=669199] [2023-02-22 20:30:37 +1300]
               | 794 samples. 198.55933004603867 requests per second. S/D: 433.124µs.
  5.0s     info: Benchmark::HTTP::Command::Hammer [oid=0x5dc] [ec=0x668] [pid=669199] [2023-02-22 20:30:38 +1300]
               | 993 samples. 198.861220506763 requests per second. S/D: 388.315µs.
 5.03s     info: Benchmark::HTTP::Command::Hammer [oid=0x5dc] [ec=0x604] [pid=669199] [2023-02-22 20:30:39 +1300]
               | I made 1000 requests in 5.0s. The per-request latency was 5.03ms. That's 198.78134736025046 asynchronous requests/second.
               |              Variance: 0.156µs
               |    Standard Deviation: 394.988µs
               |        Standard Error: 12.491µs
               | 1000 samples. 198.78134736025046 requests per second. S/D: 394.988µs.

> benchmark-http hammer -k 10 -c 100 http://localhost:9292
  0.0s     info: Benchmark::HTTP::Command::Hammer [oid=0x5dc] [ec=0x5f0] [pid=669298] [2023-02-22 20:31:06 +1300]
               | I am going to benchmark http://localhost:9292...
  0.0s     info: Benchmark::HTTP::Command::Hammer [oid=0x5dc] [ec=0x604] [pid=669298] [2023-02-22 20:31:06 +1300]
               | I am running 10 asynchronous tasks that will each make 100 sequential requests...
  1.0s     info: Benchmark::HTTP::Command::Hammer [oid=0x5dc] [ec=0x884] [pid=669298] [2023-02-22 20:31:07 +1300]
               | 116 samples. 122.14714448306705 requests per second. S/D: 53.06ms.
  2.0s     info: Benchmark::HTTP::Command::Hammer [oid=0x5dc] [ec=0x884] [pid=669298] [2023-02-22 20:31:08 +1300]
               | 231 samples. 118.39398740366056 requests per second. S/D: 40.74ms.
  3.0s     info: Benchmark::HTTP::Command::Hammer [oid=0x5dc] [ec=0x884] [pid=669298] [2023-02-22 20:31:09 +1300]
               | 345 samples. 117.16770367024395 requests per second. S/D: 35.92ms.
  4.0s     info: Benchmark::HTTP::Command::Hammer [oid=0x5dc] [ec=0x884] [pid=669298] [2023-02-22 20:31:10 +1300]
               | 462 samples. 116.91273586121054 requests per second. S/D: 33.35ms.
  5.0s     info: Benchmark::HTTP::Command::Hammer [oid=0x5dc] [ec=0x884] [pid=669298] [2023-02-22 20:31:11 +1300]
               | 577 samples. 116.49019049792929 requests per second. S/D: 31.63ms.
  6.0s     info: Benchmark::HTTP::Command::Hammer [oid=0x5dc] [ec=0x884] [pid=669298] [2023-02-22 20:31:12 +1300]
               | 695 samples. 116.72173008082338 requests per second. S/D: 30.43ms.
  7.0s     info: Benchmark::HTTP::Command::Hammer [oid=0x5dc] [ec=0x884] [pid=669298] [2023-02-22 20:31:13 +1300]
               | 809 samples. 116.47977214734279 requests per second. S/D: 29.49ms.
  8.0s     info: Benchmark::HTTP::Command::Hammer [oid=0x5dc] [ec=0x884] [pid=669298] [2023-02-22 20:31:14 +1300]
               | 924 samples. 116.28354107812305 requests per second. S/D: 28.78ms.
 8.65s     info: Benchmark::HTTP::Command::Hammer [oid=0x5dc] [ec=0x604] [pid=669298] [2023-02-22 20:31:15 +1300]
               | I made 1000 requests in 8.5s. The per-request latency was 84.64ms. That's 118.14334241057423 asynchronous requests/second.
               |              Variance: 862.900µs
               |    Standard Deviation: 29.38ms
               |        Standard Error: 928.924µs
               | 1000 samples. 118.14334241057423 requests per second. S/D: 29.38ms.

Falcon with single thread

> falcon serve --count 1 --bind http://localhost:9292
  0.0s     info: Falcon::Command::Serve [oid=0x988] [ec=0x99c] [pid=669406] [2023-02-22 20:31:47 +1300]
               | Falcon v0.42.3 taking flight! Using Async::Container::Forked {:count=>1}.
               | - Binding to: #<Falcon::Endpoint http://localhost:9292/ {}>
               | - To terminate: Ctrl-C or kill 669406
               | - To reload configuration: kill -HUP 669406
 0.03s     info: Falcon::Controller::Serve [oid=0xa14] [ec=0x99c] [pid=669406] [2023-02-22 20:31:47 +1300]
               | Starting Falcon Server on http://localhost:9292/
 0.03s     info: Async::Container::Process::Instance [oid=0xa28] [ec=0xa3c] [pid=669407] [2023-02-22 20:31:47 +1300]
               | - Per-process status: kill -USR1 669407

Results

> benchmark-http hammer -k 1 -c 1000 http://localhost:9292
  0.0s     info: Benchmark::HTTP::Command::Hammer [oid=0x5dc] [ec=0x5f0] [pid=669417] [2023-02-22 20:31:51 +1300]
               | I am going to benchmark http://localhost:9292...
  0.0s     info: Benchmark::HTTP::Command::Hammer [oid=0x5dc] [ec=0x604] [pid=669417] [2023-02-22 20:31:51 +1300]
               | I am running 1 asynchronous tasks that will each make 1000 sequential requests...
  1.0s     info: Benchmark::HTTP::Command::Hammer [oid=0x5dc] [ec=0x668] [pid=669417] [2023-02-22 20:31:52 +1300]
               | 209 samples. 209.09651464730223 requests per second. S/D: 242.179µs.
  2.0s     info: Benchmark::HTTP::Command::Hammer [oid=0x5dc] [ec=0x668] [pid=669417] [2023-02-22 20:31:53 +1300]
               | 416 samples. 208.09915081272587 requests per second. S/D: 347.707µs.
  3.0s     info: Benchmark::HTTP::Command::Hammer [oid=0x5dc] [ec=0x668] [pid=669417] [2023-02-22 20:31:54 +1300]
               | 620 samples. 206.82519945339186 requests per second. S/D: 418.930µs.
  4.0s     info: Benchmark::HTTP::Command::Hammer [oid=0x5dc] [ec=0x668] [pid=669417] [2023-02-22 20:31:55 +1300]
               | 824 samples. 206.10557878855118 requests per second. S/D: 403.577µs.
 4.86s     info: Benchmark::HTTP::Command::Hammer [oid=0x5dc] [ec=0x604] [pid=669417] [2023-02-22 20:31:56 +1300]
               | I made 1000 requests in 4.9s. The per-request latency was 4.86ms. That's 205.89991047275603 asynchronous requests/second.
               |              Variance: 0.163µs
               |    Standard Deviation: 403.871µs
               |        Standard Error: 12.772µs
               | 1000 samples. 205.89991047275603 requests per second. S/D: 403.871µs.

> benchmark-http hammer -k 10 -c 100 http://localhost:9292
  0.0s     info: Benchmark::HTTP::Command::Hammer [oid=0x5dc] [ec=0x5f0] [pid=669442] [2023-02-22 20:32:13 +1300]
               | I am going to benchmark http://localhost:9292...
  0.0s     info: Benchmark::HTTP::Command::Hammer [oid=0x5dc] [ec=0x604] [pid=669442] [2023-02-22 20:32:13 +1300]
               | I am running 10 asynchronous tasks that will each make 100 sequential requests...
  1.0s     info: Benchmark::HTTP::Command::Hammer [oid=0x5dc] [ec=0x884] [pid=669442] [2023-02-22 20:32:14 +1300]
               | 203 samples. 208.29932671157928 requests per second. S/D: 16.98ms.
  2.0s     info: Benchmark::HTTP::Command::Hammer [oid=0x5dc] [ec=0x884] [pid=669442] [2023-02-22 20:32:15 +1300]
               | 406 samples. 205.80047875340966 requests per second. S/D: 12.05ms.
  3.0s     info: Benchmark::HTTP::Command::Hammer [oid=0x5dc] [ec=0x884] [pid=669442] [2023-02-22 20:32:16 +1300]
               | 609 samples. 204.62911276684432 requests per second. S/D: 9.87ms.
  4.0s     info: Benchmark::HTTP::Command::Hammer [oid=0x5dc] [ec=0x884] [pid=669442] [2023-02-22 20:32:17 +1300]
               | 810 samples. 203.8399325124944 requests per second. S/D: 8.58ms.
 4.95s     info: Benchmark::HTTP::Command::Hammer [oid=0x5dc] [ec=0x604] [pid=669442] [2023-02-22 20:32:18 +1300]
               | I made 1000 requests in 4.9s. The per-request latency was 48.78ms. That's 205.01403446521445 asynchronous requests/second.
               |              Variance: 70.191µs
               |    Standard Deviation: 8.38ms
               |        Standard Error: 264.935µs
               | 1000 samples. 205.01403446521445 requests per second. S/D: 8.38ms.

Falcon with 16 processes

> falcon serve --count 16 --bind http://localhost:9292
  0.0s     info: Falcon::Command::Serve [oid=0x988] [ec=0x99c] [pid=669597] [2023-02-22 20:33:03 +1300]
               | Falcon v0.42.3 taking flight! Using Async::Container::Forked {:count=>16}.
               | - Binding to: #<Falcon::Endpoint http://localhost:9292/ {}>
               | - To terminate: Ctrl-C or kill 669597
               | - To reload configuration: kill -HUP 669597
 0.03s     info: Falcon::Controller::Serve [oid=0xa14] [ec=0x99c] [pid=669597] [2023-02-22 20:33:03 +1300]
               | Starting Falcon Server on http://localhost:9292/
 0.03s     info: Async::Container::Process::Instance [oid=0xa28] [ec=0xa3c] [pid=669598] [2023-02-22 20:33:03 +1300]
               | - Per-process status: kill -USR1 669598
 0.03s     info: Async::Container::Process::Instance [oid=0xa50] [ec=0xa64] [pid=669599] [2023-02-22 20:33:03 +1300]
               | - Per-process status: kill -USR1 669599
 0.04s     info: Async::Container::Process::Instance [oid=0xa78] [ec=0xa8c] [pid=669600] [2023-02-22 20:33:03 +1300]
               | - Per-process status: kill -USR1 669600
 0.04s     info: Async::Container::Process::Instance [oid=0xaa0] [ec=0xab4] [pid=669601] [2023-02-22 20:33:03 +1300]
               | - Per-process status: kill -USR1 669601
 0.04s     info: Async::Container::Process::Instance [oid=0xac8] [ec=0xadc] [pid=669602] [2023-02-22 20:33:03 +1300]
               | - Per-process status: kill -USR1 669602
 0.04s     info: Async::Container::Process::Instance [oid=0xaf0] [ec=0xb04] [pid=669603] [2023-02-22 20:33:03 +1300]
               | - Per-process status: kill -USR1 669603
 0.04s     info: Async::Container::Process::Instance [oid=0xb18] [ec=0xb2c] [pid=669605] [2023-02-22 20:33:03 +1300]
               | - Per-process status: kill -USR1 669605
 0.04s     info: Async::Container::Process::Instance [oid=0xb40] [ec=0xb54] [pid=669607] [2023-02-22 20:33:03 +1300]
               | - Per-process status: kill -USR1 669607
 0.04s     info: Async::Container::Process::Instance [oid=0xb68] [ec=0xb7c] [pid=669610] [2023-02-22 20:33:03 +1300]
               | - Per-process status: kill -USR1 669610
 0.04s     info: Async::Container::Process::Instance [oid=0xb90] [ec=0xba4] [pid=669612] [2023-02-22 20:33:03 +1300]
               | - Per-process status: kill -USR1 669612
 0.04s     info: Async::Container::Process::Instance [oid=0xbb8] [ec=0xbcc] [pid=669614] [2023-02-22 20:33:03 +1300]
               | - Per-process status: kill -USR1 669614
 0.04s     info: Async::Container::Process::Instance [oid=0xbe0] [ec=0xbf4] [pid=669617] [2023-02-22 20:33:03 +1300]
               | - Per-process status: kill -USR1 669617
 0.04s     info: Async::Container::Process::Instance [oid=0xc08] [ec=0xc1c] [pid=669619] [2023-02-22 20:33:03 +1300]
               | - Per-process status: kill -USR1 669619
 0.04s     info: Async::Container::Process::Instance [oid=0xc30] [ec=0xc44] [pid=669620] [2023-02-22 20:33:03 +1300]
               | - Per-process status: kill -USR1 669620
 0.04s     info: Async::Container::Process::Instance [oid=0xc58] [ec=0xc6c] [pid=669622] [2023-02-22 20:33:03 +1300]
               | - Per-process status: kill -USR1 669622
 0.04s     info: Async::Container::Process::Instance [oid=0xc80] [ec=0xc94] [pid=669624] [2023-02-22 20:33:03 +1300]
               | - Per-process status: kill -USR1 669624

Results

> benchmark-http hammer -k 1 -c 1000 http://localhost:9292
  0.0s     info: Benchmark::HTTP::Command::Hammer [oid=0x5dc] [ec=0x5f0] [pid=669676] [2023-02-22 20:33:22 +1300]
               | I am going to benchmark http://localhost:9292...
  0.0s     info: Benchmark::HTTP::Command::Hammer [oid=0x5dc] [ec=0x604] [pid=669676] [2023-02-22 20:33:22 +1300]
               | I am running 1 asynchronous tasks that will each make 1000 sequential requests...
  1.0s     info: Benchmark::HTTP::Command::Hammer [oid=0x5dc] [ec=0x668] [pid=669676] [2023-02-22 20:33:23 +1300]
               | 206 samples. 206.2469640481086 requests per second. S/D: 286.846µs.
  2.0s     info: Benchmark::HTTP::Command::Hammer [oid=0x5dc] [ec=0x668] [pid=669676] [2023-02-22 20:33:24 +1300]
               | 409 samples. 204.75640432796024 requests per second. S/D: 304.037µs.
  3.0s     info: Benchmark::HTTP::Command::Hammer [oid=0x5dc] [ec=0x668] [pid=669676] [2023-02-22 20:33:25 +1300]
               | 611 samples. 204.01767598349605 requests per second. S/D: 310.436µs.
  4.0s     info: Benchmark::HTTP::Command::Hammer [oid=0x5dc] [ec=0x668] [pid=669676] [2023-02-22 20:33:26 +1300]
               | 815 samples. 203.91515242435497 requests per second. S/D: 276.685µs.
 4.91s     info: Benchmark::HTTP::Command::Hammer [oid=0x5dc] [ec=0x604] [pid=669676] [2023-02-22 20:33:26 +1300]
               | I made 1000 requests in 4.9s. The per-request latency was 4.91ms. That's 203.84858404828225 asynchronous requests/second.
               |              Variance: 0.103µs
               |    Standard Deviation: 320.396µs
               |        Standard Error: 10.132µs
               | 1000 samples. 203.84858404828225 requests per second. S/D: 320.396µs.

> benchmark-http hammer -k 10 -c 100 http://localhost:9292
  0.0s     info: Benchmark::HTTP::Command::Hammer [oid=0x5dc] [ec=0x5f0] [pid=669694] [2023-02-22 20:33:35 +1300]
               | I am going to benchmark http://localhost:9292...
  0.0s     info: Benchmark::HTTP::Command::Hammer [oid=0x5dc] [ec=0x604] [pid=669694] [2023-02-22 20:33:35 +1300]
               | I am running 10 asynchronous tasks that will each make 100 sequential requests...
 0.99s     info: Benchmark::HTTP::Command::Hammer [oid=0x5dc] [ec=0x604] [pid=669694] [2023-02-22 20:33:36 +1300]
               | I made 1000 requests in 602.49ms. The per-request latency was 6.02ms. That's 1659.769054306381 asynchronous requests/second.
               |              Variance: 4.593µs
               |    Standard Deviation: 2.14ms
               |        Standard Error: 67.772µs
               | 1000 samples. 1659.769054306381 requests per second. S/D: 2.14ms.

ioquatix commented 1 year ago

The conclusions I can draw from the data:

Time.parse seems like it's not scaling very well, it seems slow. cc @nobu
Multi-thread puma and multi-fiber falcon scale as you'd expect with a slow, non-parallel (not releasing the GVL) operation i.e. both are equally bad.
Multi-process falcon scales really well, and it's likely multi-worker puma would scale equally well (confirmed with results below).
My general advice, and the default with falcon, is a multi-process model - with 1 per core by default.

The real solution here is to figure out why Time.parse is so slow and if so, release the GVL if so, or just make it faster.

cc @nateberkopec just FYI as it's an interesting result.

Puma with 16 workers and 1 thread each

> puma -b tcp://localhost:9292 -w16 -t1 config.ru
[670108] Puma starting in cluster mode...
[670108] * Puma version: 6.0.2 (ruby 3.2.1-p31) ("Sunflower")
[670108] *  Min threads: 1
[670108] *  Max threads: 1
[670108] *  Environment: development
[670108] *   Master PID: 670108
[670108] *      Workers: 16
[670108] *     Restarts: (✔) hot (✔) phased
[670108] * Listening on http://127.0.0.1:9292
[670108] * Listening on http://[::1]:9292
[670108] Use Ctrl-C to stop
[670109] + Gemfile in context: /home/samuel/tmp/falcon-time-example/Gemfile
[670110] + Gemfile in context: /home/samuel/tmp/falcon-time-example/Gemfile
[670114] + Gemfile in context: /home/samuel/tmp/falcon-time-example/Gemfile
[670112] + Gemfile in context: /home/samuel/tmp/falcon-time-example/Gemfile
[670115] + Gemfile in context: /home/samuel/tmp/falcon-time-example/Gemfile
[670118] + Gemfile in context: /home/samuel/tmp/falcon-time-example/Gemfile
[670120] + Gemfile in context: /home/samuel/tmp/falcon-time-example/Gemfile
[670122] + Gemfile in context: /home/samuel/tmp/falcon-time-example/Gemfile
[670124] + Gemfile in context: /home/samuel/tmp/falcon-time-example/Gemfile
[670126] + Gemfile in context: /home/samuel/tmp/falcon-time-example/Gemfile
[670128] + Gemfile in context: /home/samuel/tmp/falcon-time-example/Gemfile
[670130] + Gemfile in context: /home/samuel/tmp/falcon-time-example/Gemfile
[670133] + Gemfile in context: /home/samuel/tmp/falcon-time-example/Gemfile
[670134] + Gemfile in context: /home/samuel/tmp/falcon-time-example/Gemfile
[670137] + Gemfile in context: /home/samuel/tmp/falcon-time-example/Gemfile
[670139] + Gemfile in context: /home/samuel/tmp/falcon-time-example/Gemfile
[670108] - Worker 1 (PID: 670110) booted in 0.01s, phase: 0
[670108] - Worker 2 (PID: 670112) booted in 0.01s, phase: 0
[670108] - Worker 3 (PID: 670114) booted in 0.01s, phase: 0
[670108] - Worker 0 (PID: 670109) booted in 0.01s, phase: 0
[670108] - Worker 5 (PID: 670118) booted in 0.01s, phase: 0
[670108] - Worker 4 (PID: 670115) booted in 0.01s, phase: 0
[670108] - Worker 6 (PID: 670120) booted in 0.01s, phase: 0
[670108] - Worker 11 (PID: 670130) booted in 0.01s, phase: 0
[670108] - Worker 7 (PID: 670122) booted in 0.01s, phase: 0
[670108] - Worker 10 (PID: 670128) booted in 0.01s, phase: 0
[670108] - Worker 9 (PID: 670126) booted in 0.01s, phase: 0
[670108] - Worker 12 (PID: 670133) booted in 0.01s, phase: 0
[670108] - Worker 13 (PID: 670134) booted in 0.01s, phase: 0
[670108] - Worker 14 (PID: 670137) booted in 0.01s, phase: 0
[670108] - Worker 15 (PID: 670139) booted in 0.01s, phase: 0
[670108] - Worker 8 (PID: 670124) booted in 0.01s, phase: 0

Results

> benchmark-http hammer -k 1 -c 1000 http://localhost:9292
  0.0s     info: Benchmark::HTTP::Command::Hammer [oid=0x5dc] [ec=0x5f0] [pid=670290] [2023-02-22 20:38:24 +1300]
               | I am going to benchmark http://localhost:9292...
  0.0s     info: Benchmark::HTTP::Command::Hammer [oid=0x5dc] [ec=0x604] [pid=670290] [2023-02-22 20:38:24 +1300]
               | I am running 1 asynchronous tasks that will each make 1000 sequential requests...
  1.0s     info: Benchmark::HTTP::Command::Hammer [oid=0x5dc] [ec=0x668] [pid=670290] [2023-02-22 20:38:25 +1300]
               | 197 samples. 197.289334073988 requests per second. S/D: 438.470µs.
  2.0s     info: Benchmark::HTTP::Command::Hammer [oid=0x5dc] [ec=0x668] [pid=670290] [2023-02-22 20:38:26 +1300]
               | 395 samples. 197.73123799305787 requests per second. S/D: 321.028µs.
  3.0s     info: Benchmark::HTTP::Command::Hammer [oid=0x5dc] [ec=0x668] [pid=670290] [2023-02-22 20:38:27 +1300]
               | 596 samples. 198.80151777162382 requests per second. S/D: 286.039µs.
  4.0s     info: Benchmark::HTTP::Command::Hammer [oid=0x5dc] [ec=0x668] [pid=670290] [2023-02-22 20:38:28 +1300]
               | 793 samples. 198.41339605304756 requests per second. S/D: 267.908µs.
  5.0s     info: Benchmark::HTTP::Command::Hammer [oid=0x5dc] [ec=0x668] [pid=670290] [2023-02-22 20:38:29 +1300]
               | 992 samples. 198.4782297501735 requests per second. S/D: 240.733µs.
 5.04s     info: Benchmark::HTTP::Command::Hammer [oid=0x5dc] [ec=0x604] [pid=670290] [2023-02-22 20:38:29 +1300]
               | I made 1000 requests in 5.0s. The per-request latency was 5.04ms. That's 198.4768023138043 asynchronous requests/second.
               |              Variance: 0.058µs
               |    Standard Deviation: 239.812µs
               |        Standard Error: 7.584µs
               | 1000 samples. 198.4768023138043 requests per second. S/D: 239.812µs.

> benchmark-http hammer -k 10 -c 100 http://localhost:9292
  0.0s     info: Benchmark::HTTP::Command::Hammer [oid=0x5dc] [ec=0x5f0] [pid=670305] [2023-02-22 20:38:32 +1300]
               | I am going to benchmark http://localhost:9292...
  0.0s     info: Benchmark::HTTP::Command::Hammer [oid=0x5dc] [ec=0x604] [pid=670305] [2023-02-22 20:38:32 +1300]
               | I am running 10 asynchronous tasks that will each make 100 sequential requests...
  1.0s     info: Benchmark::HTTP::Command::Hammer [oid=0x5dc] [ec=0x884] [pid=670305] [2023-02-22 20:38:33 +1300]
               | 977 samples. 1215.103898755889 requests per second. S/D: 12.13ms.
 1.06s     info: Benchmark::HTTP::Command::Hammer [oid=0x5dc] [ec=0x604] [pid=670305] [2023-02-22 20:38:33 +1300]
               | I made 1000 requests in 820.92ms. The per-request latency was 8.21ms. That's 1218.148718424648 asynchronous requests/second.
               |              Variance: 146.443µs
               |    Standard Deviation: 12.10ms
               |        Standard Error: 382.678µs
               | 1000 samples. 1218.148718424648 requests per second. S/D: 12.10ms.

bragovo commented 1 year ago

Checked puma and falcon in single mode and with TZ:

brg@AMacbook falcon-time-example % echo $TZ

brg@AMacbook falcon-time-example % puma -b tcp://localhost:9292 -t1 config.ru 

Concurrency Level:      10
Time taken for tests:   0.721 seconds
Complete requests:      100
Failed requests:        0
Total transferred:      4100 bytes
HTML transferred:       300 bytes
Requests per second:    138.77 [#/sec] (mean)
Time per request:       72.064 [ms] (mean)
Time per request:       7.206 [ms] (mean, across all concurrent requests)
Transfer rate:          5.56 [Kbytes/sec] received

brg@AMacbook falcon-time-example % bundle exec falcon serve -b http://0.0.0.0:9292 -n 1

Concurrency Level:      10
Time taken for tests:   30.887 seconds
Complete requests:      100
Failed requests:        0
Total transferred:      6400 bytes
HTML transferred:       300 bytes
Requests per second:    3.24 [#/sec] (mean)
Time per request:       3088.667 [ms] (mean)
Time per request:       308.867 [ms] (mean, across all concurrent requests)
Transfer rate:          0.20 [Kbytes/sec] received

brg@AMacbook falcon-time-example % export TZ=Africa/Cairo
brg@AMacbook falcon-time-example % bundle exec falcon serve -b http://0.0.0.0:9292 -n 1

Concurrency Level:      10
Time taken for tests:   22.715 seconds
Complete requests:      100
Failed requests:        0
Total transferred:      6400 bytes
HTML transferred:       300 bytes
Requests per second:    4.40 [#/sec] (mean)
Time per request:       2271.477 [ms] (mean)
Time per request:       227.148 [ms] (mean, across all concurrent requests)
Transfer rate:          0.28 [Kbytes/sec] received

bragovo commented 1 year ago

2023-02-22-nqvGoqWxIv.speedscope.json.zip

This is speedscope log for https://www.speedscope.app/ There are 3 requests about ~250ms

bragovo commented 1 year ago

2023-02-22-AXAjSpWAgl.speedscope.json.zip

Puma log.

ioquatix commented 1 year ago

@bragovo did you add require 'time' to your example? Because Puma fails every request without it, and is much faster than Time.parse doing actual work.

bragovo commented 1 year ago

There is no any problem with time and puma for me... hmmm..

puts require "time"
# false

I added debug output https://github.com/bragovo/falcon-time-example/commit/e71d2335f5ba9ae2e48e2137a965b45a099204d9

tm: 2022
tm: 2022
tm: 2022
tm: 2022
tm: 2022
...

# with Time
puma -b tcp://localhost:9292 -t1 config.ru  

Concurrency Level:      10
Time taken for tests:   0.941 seconds
Complete requests:      100
Failed requests:        0
Total transferred:      4100 bytes
HTML transferred:       300 bytes
Requests per second:    106.26 [#/sec] (mean)

# With Time

bundle exec falcon serve -b http://0.0.0.0:9292 -n 1

Concurrency Level:      10
Time taken for tests:   32.783 seconds
Complete requests:      100
Failed requests:        0
Total transferred:      6400 bytes
HTML transferred:       300 bytes
Requests per second:    3.05 [#/sec] (mean)

# with DateTime

Concurrency Level:      10
Time taken for tests:   1.417 seconds
Complete requests:      100
Failed requests:        0
Total transferred:      6400 bytes
HTML transferred:       300 bytes
Requests per second:    70.57 [#/sec] (mean)

The same behavior.

ioquatix commented 1 year ago

I could reproduce it on macOS but not Linux, I wonder why.

ioquatix commented 1 year ago

It looks like, in this case, Time.parse is literally several orders of magnitude slower when running on Falcon, lol.

class App
  def call(env)
    result = Benchmark.measure do
      1000.times do
        tm = Time.parse("2022-10-24 12:33:44.054237")
      end
    end
    $stderr.puts result

    [200, {}, ["App"]]
  end
end

Falcon

  0.205562   0.150212   0.355774 (  0.357190)
  0.207391   0.149797   0.357188 (  0.358386)
  0.206125   0.147484   0.353609 (  0.353810)
  0.206682   0.148099   0.354781 (  0.354914)
  0.207301   0.148058   0.355359 (  0.355584)
  0.207845   0.149169   0.357014 (  0.357443)
  0.205054   0.146816   0.351870 (  0.351877)
  0.205433   0.145754   0.351187 (  0.351307)
  0.205116   0.145693   0.350809 (  0.350815)
  0.204316   0.146940   0.351256 (  0.351980)
  0.205102   0.147017   0.352119 (  0.352140)
  0.204953   0.146605   0.351558 (  0.351567)
  0.204540   0.146333   0.350873 (  0.350910)
  0.204887   0.145952   0.350839 (  0.350848)
  0.206139   0.146984   0.353123 (  0.353189)
  0.205440   0.145303   0.350743 (  0.350892)
  0.204737   0.144865   0.349602 (  0.349610)
  0.205570   0.145639   0.351209 (  0.351249)
  0.204542   0.145592   0.350134 (  0.350538)
  0.204656   0.145395   0.350051 (  0.350518)

Puma

  0.006511   0.000126   0.006637 (  0.006633)
  0.006116   0.000097   0.006213 (  0.006212)
  0.005920   0.000041   0.005961 (  0.005960)
  0.006104   0.000023   0.006127 (  0.006126)
  0.005970   0.000057   0.006027 (  0.006026)
  0.006245   0.000068   0.006313 (  0.006317)
  0.006236   0.000045   0.006281 (  0.006281)
  0.006006   0.000040   0.006046 (  0.006046)
  0.006264   0.000073   0.006337 (  0.006338)
  0.006469   0.000026   0.006495 (  0.006492)
  0.005930   0.000033   0.005963 (  0.005962)
  0.006041   0.000033   0.006074 (  0.006074)
  0.006189   0.000048   0.006237 (  0.006239)
  0.006296   0.000029   0.006325 (  0.006325)
  0.006124   0.000068   0.006192 (  0.006187)
  0.005909   0.000031   0.005940 (  0.005940)
  0.006074   0.000049   0.006123 (  0.006123)
  0.005927   0.000070   0.005997 (  0.005997)
  0.006000   0.000098   0.006098 (  0.006097)
  0.006031   0.000083   0.006114 (  0.006118)

ioquatix commented 1 year ago

==================================
  Mode: cpu(1000)
  Samples: 436 (0.00% miss rate)
  GC: 0 (0.00%)
==================================
     TOTAL    (pct)     SAMPLES    (pct)     FRAME
       365  (83.7%)         365  (83.7%)     Time.local
        65  (14.9%)          65  (14.9%)     Time#getlocal
         5   (1.1%)           4   (0.9%)     Date._parse
         1   (0.2%)           1   (0.2%)     String#gsub!
         1   (0.2%)           1   (0.2%)     Kernel#respond_to?
       436 (100.0%)           0   (0.0%)     Benchmark.measure
       436 (100.0%)           0   (0.0%)     StackProf.run
       436 (100.0%)           0   (0.0%)     block in <main>
       436 (100.0%)           0   (0.0%)     Kernel#eval
       436 (100.0%)           0   (0.0%)     Rack::Builder.new_from_string
       436 (100.0%)           0   (0.0%)     Rack::Builder.load_file
       436 (100.0%)           0   (0.0%)     Rack::Builder.parse_file
       436 (100.0%)           0   (0.0%)     Falcon::Command::Serve#load_app
       436 (100.0%)           0   (0.0%)     Falcon::Controller::Serve#load_app
       436 (100.0%)           0   (0.0%)     Falcon::Controller::Serve#setup
       436 (100.0%)           0   (0.0%)     Async::Task#run
       436 (100.0%)           0   (0.0%)     Async::Task#schedule
       436 (100.0%)           0   (0.0%)     Integer#times
       431  (98.9%)           0   (0.0%)     Time.make_time
       436 (100.0%)           0   (0.0%)     #<Rack::Builder:0x000000010418e5c0>.sir_parse_alot
       436 (100.0%)           0   (0.0%)     Time.parse
       436 (100.0%)           0   (0.0%)     block (2 levels) in <main>

ioquatix commented 1 year ago

It seems called much less on Puma:

==================================
  Mode: cpu(1000)
  Samples: 6 (0.00% miss rate)
  GC: 0 (0.00%)
==================================
     TOTAL    (pct)     SAMPLES    (pct)     FRAME
         4  (66.7%)           2  (33.3%)     Date._parse
         1  (16.7%)           1  (16.7%)     Regexp#match
         1  (16.7%)           1  (16.7%)     MatchData#begin
         1  (16.7%)           1  (16.7%)     Time.local
         1  (16.7%)           1  (16.7%)     Time.now
         6 (100.0%)           0   (0.0%)     Benchmark.measure
         6 (100.0%)           0   (0.0%)     StackProf.run
         6 (100.0%)           0   (0.0%)     block in <main>
         6 (100.0%)           0   (0.0%)     Kernel#eval
         6 (100.0%)           0   (0.0%)     Rack::Builder.new_from_string
         6 (100.0%)           0   (0.0%)     Rack::Builder.load_file
         6 (100.0%)           0   (0.0%)     Rack::Builder.parse_file
         6 (100.0%)           0   (0.0%)     Puma::Configuration#load_rackup
         6 (100.0%)           0   (0.0%)     Puma::Configuration#app
         6 (100.0%)           0   (0.0%)     Puma::Runner#load_and_bind
         6 (100.0%)           0   (0.0%)     Puma::Single#run
         6 (100.0%)           0   (0.0%)     Puma::Launcher#run
         6 (100.0%)           0   (0.0%)     Puma::CLI#run
         6 (100.0%)           0   (0.0%)     <top (required)>
         6 (100.0%)           0   (0.0%)     Kernel#load
         6 (100.0%)           0   (0.0%)     <main>
         1  (16.7%)           0   (0.0%)     Time.make_time
         6 (100.0%)           0   (0.0%)     Time.parse
         6 (100.0%)           0   (0.0%)     #<Rack::Builder:0x0000000106876188>.sir_parse_alot
         6 (100.0%)           0   (0.0%)     Integer#times
         6 (100.0%)           0   (0.0%)     block (2 levels) in <main>

bragovo commented 1 year ago

Yeap, true. Usually I run app on docker alpine image and didn't see the differences, but once I run onto host machine ))

Actually on linux Time with Falcon slower than Puma, not so dramatically but exists. About 15-20% slower. I have an endpoint with 10k records (dont ask me why :-D) from PG (with Sequel), and .select(..., :created_at, :updated_at) reduced performance test about 10-20%. So at this moment I override config from Time to DateTime Sequel.datetime_class = DateTime

ioquatix commented 1 year ago

The issue can be reproduced just with tm = ::Time.local(2023).

ioquatix commented 1 year ago

looks like something very weird going on in the system call?

ioquatix commented 1 year ago

This was what I was wondering about, Time.local can end up calling getenv.

I wonder if this is causing an issue.

ioquatix commented 1 year ago

Falcon

Puma

ioquatix commented 1 year ago

Here is a repro without Falcon:

require 'benchmark'
require 'async'
require 'time'
require 'stackprof'

def sir_local_alot
    result = Benchmark.measure do
        10_000.times do
            tm = ::Time.local(2023)
        end
    end
    $stderr.puts result
end

sir_local_alot

require 'async/container'

Console.logger.debug!

container = Async::Container.new

container.spawn do |task|
    sir_local_alot
end

Console.logger.debug "Waiting for container..."
container.wait
Console.logger.debug "Finished."

ioquatix commented 1 year ago

Here is a repro without falcon or async:

require 'benchmark'
require 'time'

def sir_local_alot
    result = Benchmark.measure do
        10_000.times do
            tm = ::Time.local(2023)
        end
    end
    $stderr.puts result
end

sir_local_alot

pid = fork do
    sir_local_alot
end

Process.wait(pid)

ioquatix commented 1 year ago

It looks like fork kills the performance some how.

ioquatix commented 1 year ago

As this is not a bug with Falcon but some kind of issue on macOS / Time.local, I'm going to close this issue. Thanks for the report, I hope we can figure out why it's so slow. I could not reproduce this on Linux either.

https://bugs.ruby-lang.org/issues/19461

ioquatix commented 1 year ago

After investigating further, I filed a bug report with Apple. I don't think we can fix this in Ruby either. The only advice I can give regarding Falcon is to use Linux :/

bragovo commented 1 year ago

@ioquatix That's my measurements on Linux (Debian):

root@cm57c49d67f9-cxl5d:/app# ruby main.rb 
  0.047092   0.000001   0.047093 (  0.047102)
  0.079047   0.000000   0.079047 (  0.079011)

I run you example:

require 'benchmark'
require 'time'

def sir_local_alot
  result = Benchmark.measure { 100_000.times { tm = ::Time.local(2023) } }
  $stderr.puts result
end

sir_local_alot
pid = fork { sir_local_alot }

Process.wait(pid)

Sure, it's not 300x, but ~2x.

ioquatix commented 1 year ago

When you measure the C API, it's 300x slower.

bragovo commented 1 year ago

Hmm, ok 👌 I mean what maybe it's affect linux systems too, not only macos. Thanks!

ioquatix commented 1 year ago

We couldn't reproduce the extreme deviation on Linux, and we also found some hacks to fix it on macOS, e.g. TZ= ... or TZ=UTC0 ... seem to alleviate the issue but we don't know why because it's in the guts of the OS. I guess it's possible Linux also has issues but we didn't see them and were not able to reproduce it.

socketry / falcon