spion / hashtable-latencies

Latency of a web service that stores a large hashtable, in multiple languages
43 stars 8 forks source link

go: use fasthttp instead of net/http. This should reduce http server … #6

Closed valyala closed 8 years ago

valyala commented 8 years ago

…influence on the test

ngrilly commented 8 years ago

Have you measured what you gain from using fasthttp instead of net/http?

I did some tests and it looks like the high latency is due to the following issue:

Just to be clear on the terminology, this issue isn't about the STW phase, it's about mutator availability. According to the gctrace, the STW phases are all sub-millisecond. [...] the problem is that the process of scanning a single object, even during concurrent marking, is currently non-interruptible on the particular thread it's running on. Other threads will continue to make progress, though they might also get caught up scanning large single objects. Of course, this is still a GC-caused latency problem; it's just not a problem in the STW phase. :)

Source: https://github.com/golang/go/issues/15847#issuecomment-222157078

The issue is that whatever thread gets picked to scan the buckets array of the map is stuck not being able to do anything else until it's scanned the whole bucket array. If there's other mutator work queued up on that thread, it's blocked during this time.

Source: https://github.com/golang/go/issues/14812#issuecomment-222708690

valyala commented 8 years ago

Have you measured what you gain from using fasthttp instead of net/http?

Fasthttp server doesn't allocate memory unlike net/http do. So it allows measuring pure GC latency related to the code inside request handler.

ngrilly commented 8 years ago

I know fasthttp and its advantages in terms of not allocating memory. I'm just wondering if you have measured the gain, before asking the project owner to merge this PR, because my guess is that the high latency is mostly caused by the issue described by Austin Clements in my comment above.

valyala commented 8 years ago

I'm just wondering if you have measured the gain, before asking the project owner to merge this PR, because my guess is that the high latency is mostly caused by the issue described by Austin Clements in my comment above

Below are wrk2 results using the following settings (333 concurrent connections, 10K qps):

./wrk -t 2 -c 333 -R 10000 -d 20s -L http://localhost:8080/

fasthttp latency results:

  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     0.88ms  451.43us   6.64ms   67.53%
    Req/Sec     5.26k   321.86     6.67k    70.82%
  Latency Distribution (HdrHistogram - Recorded Latency)
 50.000%    0.87ms
 75.000%    1.19ms
 90.000%    1.43ms
 99.000%    1.97ms
 99.900%    3.38ms
 99.990%    6.24ms
 99.999%    6.57ms
100.000%    6.64ms

net/http results:

  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     1.19ms    3.66ms  74.69ms   99.25%
    Req/Sec     5.24k     1.04k   33.22k    98.73%
  Latency Distribution (HdrHistogram - Recorded Latency)
 50.000%    0.90ms
 75.000%    1.23ms
 90.000%    1.48ms
 99.000%    2.21ms
 99.900%   62.43ms
 99.990%   73.21ms
 99.999%   74.62ms
100.000%   74.75ms
spion commented 8 years ago

No worries, I'll definitely try this out once I get back home! :)

ngrilly commented 8 years ago

@valyala Thanks for sharing your results!

You're right about fasthttp improving the latency, compared to net/http.

I checked on my machine, using test.sh (which includes a warmup, and runs for 60 seconds instead of 20 seconds in your test).

Here are my results using net/http:

Running 1m test @ http://localhost:8080
  3 threads and 33 connections
  Latency Distribution (HdrHistogram - Recorded Latency)
 50.000%    1.10ms
 75.000%    1.51ms
 90.000%    1.81ms
 99.000%    2.30ms
 99.900%   50.65ms
 99.990%   66.69ms
 99.999%   72.19ms
100.000%   72.45ms

Running 1m test @ http://localhost:8080
  3 threads and 333 connections
  Latency Distribution (HdrHistogram - Recorded Latency)
 50.000%    1.52ms
 75.000%    2.03ms
 90.000%    2.69ms
 99.000%    4.31ms
 99.900%   60.26ms
 99.990%   65.73ms
 99.999%   67.39ms
100.000%   68.93ms

And my results using fasthttp:

Running 1m test @ http://localhost:8080
  3 threads and 33 connections
  Latency Distribution (HdrHistogram - Recorded Latency)
 50.000%    1.03ms
 75.000%    1.42ms
 90.000%    1.68ms
 99.000%    2.15ms
 99.900%    2.74ms
 99.990%   37.15ms
 99.999%   51.20ms
100.000%   60.77ms

Running 1m test @ http://localhost:8080
  3 threads and 333 connections
  Latency Distribution (HdrHistogram - Recorded Latency)
 50.000%    1.19ms
 75.000%    1.64ms
 90.000%    2.05ms
 99.000%    2.95ms
 99.900%    7.01ms
 99.990%   33.18ms
 99.999%   46.94ms
100.000%   60.32ms

With 3 threads and 333 connections, Go performs almost like OCaml. But with 3 threads and 33 connections, Go still performs a lot worse than OCaml, even when using fasthttp. This is the reason why I still think the main reason why we observe high latency is because the GC is incremental, except for the fact that scanning a single object (like the map[int][]byte) is non-interruptible as of Go 1.6.2, which block other goroutines trying to mutate the scanned object at the same time.

spion commented 8 years ago

Added the new reports and images. Seems like the running time of 60s might not be enough to accurately measure Go now at least for the 333 client version, so it might need to be increased to 5 minutes.