zigzap / zap

blazingly fast backends in zig
MIT License
2.23k stars 75 forks source link

C++: add threadpool(4) #39

Closed kassane closed 1 year ago

kassane commented 1 year ago

Refer: 6562d9ed3e4237b0cf3030c60f6d0e64a9e34144

io_context is thread-safe (single). see: https://www.boost.org/doc/libs/1_83_0/doc/html/boost_asio/overview/core/concurrency_hint.html

Server listening on port 8070...
========================================================================
                          cpp-beast
========================================================================
Running 10s test @ http://127.0.0.1:8070
  4 threads and 400 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     1.15ms    1.26ms  35.65ms   96.83%
    Req/Sec    26.48k     1.06k   28.55k    90.50%
  Latency Distribution
     50%    0.99ms
     75%    1.34ms
     90%    1.71ms
     99%    3.43ms
  1059140 requests in 10.08s, 102.02MB read
  Socket errors: connect 0, read 22108, write 1037032, timeout 0
Requests/sec: 105102.67
Transfer/sec:     10.12MB

poop perf:

$>poop -d 100 './wrk/measure.sh go' './wrk/measure.sh cpp-beast' './wrk/measure.sh rust-axum' './wrk/measure.sh zig-zap'
Benchmark 1 (3 runs): ./wrk/measure.sh go
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          11.1s  ± 32.7ms    11.1s  … 11.1s           0 ( 0%)        0%
  peak_rss           24.5MB ± 1.33MB    23.0MB … 25.4MB          0 ( 0%)        0%
  cpu_cycles         87.6G  ±  657M     87.0G  … 88.3G           0 ( 0%)        0%
  instructions        129G  ±  452M      128G  …  129G           0 ( 0%)        0%
  cache_references   13.4G  ± 81.8M     13.3G  … 13.4G           0 ( 0%)        0%
  cache_misses       1.66G  ± 20.9M     1.65G  … 1.69G           0 ( 0%)        0%
  branch_misses       344M  ± 4.48M      339M  …  348M           0 ( 0%)        0%
Benchmark 2 (3 runs): ./wrk/measure.sh cpp-beast
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          11.2s  ± 9.44ms    11.2s  … 11.2s           0 ( 0%)          +  1.0% ±  0.5%
  peak_rss           37.4MB ±    0      37.4MB … 37.4MB          0 ( 0%)        💩+ 52.6% ±  8.7%
  cpu_cycles         13.2G  ±  123M     13.0G  … 13.3G           0 ( 0%)        ⚡- 85.0% ±  1.2%
  instructions       11.2G  ±  111M     11.0G  … 11.3G           0 ( 0%)        ⚡- 91.3% ±  0.6%
  cache_references   1.88G  ± 28.0M     1.85G  … 1.90G           0 ( 0%)        ⚡- 86.0% ±  1.0%
  cache_misses        319M  ± 3.24M      316M  …  322M           0 ( 0%)        ⚡- 80.8% ±  2.0%
  branch_misses       106M  ± 5.30M      100M  …  111M           0 ( 0%)        ⚡- 69.2% ±  3.2%
Benchmark 3 (3 runs): ./wrk/measure.sh rust-axum
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          11.1s  ± 7.69ms    11.1s  … 11.1s           0 ( 0%)          +  0.1% ±  0.5%
  peak_rss           25.7MB ±  139KB    25.5MB … 25.8MB          0 ( 0%)          +  4.9% ±  8.8%
  cpu_cycles         65.9G  ± 1.14G     64.8G  … 67.1G           0 ( 0%)        ⚡- 24.8% ±  2.4%
  instructions       85.6G  ± 2.66G     83.1G  … 88.4G           0 ( 0%)        ⚡- 33.5% ±  3.4%
  cache_references   11.9G  ±  250M     11.6G  … 12.1G           0 ( 0%)        ⚡- 10.9% ±  3.2%
  cache_misses        839M  ± 64.9M      768M  …  895M           0 ( 0%)        ⚡- 49.6% ±  6.6%
  branch_misses       295M  ± 20.1M      272M  …  309M           0 ( 0%)        ⚡- 14.2% ±  9.6%
Benchmark 4 (3 runs): ./wrk/measure.sh zig-zap
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          11.4s  ±  589ms    11.1s  … 12.1s           0 ( 0%)          +  3.0% ±  8.5%
  peak_rss           37.6MB ±    0      37.6MB … 37.6MB          0 ( 0%)        💩+ 53.7% ±  8.7%
  cpu_cycles         41.5G  ±  366M     41.2G  … 41.9G           0 ( 0%)        ⚡- 52.6% ±  1.4%
  instructions       60.2G  ±  463M     59.7G  … 60.7G           0 ( 0%)        ⚡- 53.2% ±  0.8%
  cache_references   5.66G  ± 20.6M     5.63G  … 5.67G           0 ( 0%)        ⚡- 57.7% ±  1.0%
  cache_misses        337M  ± 8.66M      328M  …  345M           0 ( 0%)        ⚡- 79.7% ±  2.2%
  branch_misses       209M  ± 7.31M      204M  …  217M           0 ( 0%)        ⚡- 39.2% ±  4.0%
renerocksai commented 1 year ago

Sorry I don't have time to dig into this. Can you please explain briefly what this PR is trying to achieve?

renerocksai commented 1 year ago

Oh, just realized. Thread pool? The branch name is more descriptive than your entire message...

kassane commented 1 year ago

Hi,

Sorry, I know it's not the purpose of the project to dedicate to just the benchmark. And also when I fixed the problem with the strings in the loop, it had already merged.

Briefly add threadpool and unlock thread from context.

renerocksai commented 1 year ago

Hi, ok, so it is ready for merging, right? Just double checking.

kassane commented 1 year ago

Done.

renerocksai commented 1 year ago

The graphs in the readme reflect the change. AFAICT, cpp-beast went from ca. 100k req/sec to > 150k req/sec.