hatoo / oha

Ohayou(おはよう), HTTP load generator, inspired by rakyll/hey with tui animation.
MIT License
6.55k stars 165 forks source link

Comparing to wrk #617

Open waghanza opened 2 days ago

waghanza commented 2 days ago

Hi @hatoo,

I'm considering using oha for https://github.com/the-benchmarker/web-frameworks, but I have some questions.

Actually, we are using wrk. Using https://github.com/the-benchmarker/web-frameworks/blob/master/rust/actix/src/main.rs, I have some results with wrk

Running 10s test @ http://172.17.0.2:3000
  2 threads and 10 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    36.11us   75.22us   3.78ms   97.44%
    Req/Sec   167.06k    11.08k  180.07k    92.08%
  3358383 requests in 10.10s, 240.21MB read
Requests/sec: 332518.67
Transfer/sec:     23.78MB

and with oha http://172.17.0.2:3000 -j --no-tui -z 10s -c 10 (which is the same options), I have

{
  "summary": {
    "successRate": 1.0,
    "total": 10.001275574,
    "slowest": 0.010046543,
    "fastest": 8.583e-6,
    "average": 0.00007257664067671008,
    "requestsPerSec": 136928.03381601934,
    "totalData": 0,
    "sizePerRequest": 0,
    "sizePerSec": 0.0
  },
  "responseTimeHistogram": {
    "0.000008583": 1,
    "0.0010123789999999999": 1369180,
    "0.002016175": 47,
    "0.0030199709999999998": 25,
    "0.004023767": 5,
    "0.005027563": 134,
    "0.006031359": 55,
    "0.007035155": 0,
    "0.008038950999999999": 0,
    "0.009042746999999999": 1,
    "0.010046542999999998": 4
  },
  "latencyPercentiles": {
    "p10": 0.000031499,
    "p25": 0.000050375,
    "p50": 0.000072624,
    "p75": 0.00008625,
    "p90": 0.00009825,
    "p95": 0.000114624,
    "p99": 0.000212581,
    "p99.9": 0.000264623,
    "p99.99": 0.004690378
  },
  "rps": {
    "mean": 136906.43878477108,
    "stddev": 26479.44862149209,
    "max": 200460.94012580605,
    "min": 54256.50815324148,
    "percentiles": {
      "p10": 97178.95504571438,
      "p25": 131526.83147360053,
      "p50": 143429.84775131833,
      "p75": 152822.892869349,
      "p90": 161744.178760928,
      "p95": 167339.7097131157,
      "p99": 175201.4416347006,
      "p99.9": 186797.24664659952,
      "p99.99": 200460.94012580605
    }
  },
  "details": {
    "DNSDialup": {
      "average": 0.0001614946,
      "fastest": 0.000038208,
      "slowest": 0.000365706
    },
    "DNSLookup": {
      "average": 0.000012528900000000001,
      "fastest": 1e-6,
      "slowest": 0.000063083
    }
  },
  "statusCodeDistribution": {
    "200": 1369452
  },
  "errorDistribution": {
    "aborted due to deadline": 3
  }
}

and with more realistic test option I've found on README oha http://172.17.0.2:3000 -j --no-tui -z 10s -c 10 --latency-correction --disable-keepalive I have

{
  "summary": {
    "successRate": 1.0,
    "total": 10.00244618,
    "slowest": 0.012744112,
    "fastest": 0.000034208,
    "average": 0.000673916113492022,
    "requestsPerSec": 14813.976234761405,
    "totalData": 0,
    "sizePerRequest": 0,
    "sizePerSec": 0.0
  },
  "responseTimeHistogram": {
    "0.000034208": 1,
    "0.0013051984": 131514,
    "0.0025761888": 3664,
    "0.0038471792": 4492,
    "0.0051181696": 5130,
    "0.00638916": 2255,
    "0.0076601504": 718,
    "0.008931140800000001": 304,
    "0.0102021312": 81,
    "0.0114731216": 6,
    "0.012744112": 4
  },
  "latencyPercentiles": {
    "p10": 0.000113207,
    "p25": 0.000163832,
    "p50": 0.000253914,
    "p75": 0.000390497,
    "p90": 0.001891944,
    "p95": 0.004174885,
    "p99": 0.006056246,
    "p99.9": 0.008555936,
    "p99.99": 0.009799259
  },
  "rps": {
    "mean": 14216.714782684752,
    "stddev": 17382.270376030592,
    "max": 64122.66094837892,
    "min": 849.88484060425,
    "percentiles": {
      "p10": 2261.3753088447224,
      "p25": 2474.4886115784316,
      "p50": 2717.37172026533,
      "p75": 31742.430106323052,
      "p90": 42629.453030938785,
      "p95": 46716.50961449759,
      "p99": 54725.266655616586,
      "p99.9": 62137.75489987862,
      "p99.99": 64122.66094837892
    }
  },
  "details": {
    "DNSDialup": {
      "average": 0.00042168944500536507,
      "fastest": 0.000012875,
      "slowest": 0.010655295
    },
    "DNSLookup": {
      "average": 1.1297418218385604e-6,
      "fastest": 4.58e-7,
      "slowest": 0.003425224
    }
  },
  "statusCodeDistribution": {
    "200": 148169
  },
  "errorDistribution": {
    "aborted due to deadline": 7
  }
}

How can you explain to variations beetween wrk and oha ?

Regards,

hatoo commented 2 days ago

Hi, I think it's simply because wrk is more optimized than oha.

Put simply, Ideally, we can distribute all works to each thread statically for this kind a application. But our runtime tokio distributes works dynamically using work-stealing, which involves some overheads for locking things. The overhead is apparent in extreme conditions that benchmark the fast server on localhost.

You can check this hypothesis by using strace. strace (-f) on wrk is very clean while oha's strace has many futex related things.

But we can implement real-time tui easily by using tokio framework. it is a good point.

See. https://emschwartz.me/async-rust-can-be-a-pleasure-to-work-with-without-send-sync-static/