Closed valyala closed 8 years ago
Have you measured what you gain from using fasthttp instead of net/http?
I did some tests and it looks like the high latency is due to the following issue:
Just to be clear on the terminology, this issue isn't about the STW phase, it's about mutator availability. According to the gctrace, the STW phases are all sub-millisecond. [...] the problem is that the process of scanning a single object, even during concurrent marking, is currently non-interruptible on the particular thread it's running on. Other threads will continue to make progress, though they might also get caught up scanning large single objects. Of course, this is still a GC-caused latency problem; it's just not a problem in the STW phase. :)
Source: https://github.com/golang/go/issues/15847#issuecomment-222157078
The issue is that whatever thread gets picked to scan the buckets array of the map is stuck not being able to do anything else until it's scanned the whole bucket array. If there's other mutator work queued up on that thread, it's blocked during this time.
Source: https://github.com/golang/go/issues/14812#issuecomment-222708690
Have you measured what you gain from using fasthttp instead of net/http?
Fasthttp
server doesn't allocate memory unlike net/http
do. So it allows measuring pure GC latency related to the code inside request handler.
I know fasthttp and its advantages in terms of not allocating memory. I'm just wondering if you have measured the gain, before asking the project owner to merge this PR, because my guess is that the high latency is mostly caused by the issue described by Austin Clements in my comment above.
I'm just wondering if you have measured the gain, before asking the project owner to merge this PR, because my guess is that the high latency is mostly caused by the issue described by Austin Clements in my comment above
Below are wrk2
results using the following settings (333 concurrent connections, 10K qps):
./wrk -t 2 -c 333 -R 10000 -d 20s -L http://localhost:8080/
fasthttp
latency results:
Thread Stats Avg Stdev Max +/- Stdev
Latency 0.88ms 451.43us 6.64ms 67.53%
Req/Sec 5.26k 321.86 6.67k 70.82%
Latency Distribution (HdrHistogram - Recorded Latency)
50.000% 0.87ms
75.000% 1.19ms
90.000% 1.43ms
99.000% 1.97ms
99.900% 3.38ms
99.990% 6.24ms
99.999% 6.57ms
100.000% 6.64ms
net/http
results:
Thread Stats Avg Stdev Max +/- Stdev
Latency 1.19ms 3.66ms 74.69ms 99.25%
Req/Sec 5.24k 1.04k 33.22k 98.73%
Latency Distribution (HdrHistogram - Recorded Latency)
50.000% 0.90ms
75.000% 1.23ms
90.000% 1.48ms
99.000% 2.21ms
99.900% 62.43ms
99.990% 73.21ms
99.999% 74.62ms
100.000% 74.75ms
No worries, I'll definitely try this out once I get back home! :)
@valyala Thanks for sharing your results!
You're right about fasthttp
improving the latency, compared to net/http
.
I checked on my machine, using test.sh
(which includes a warmup, and runs for 60 seconds instead of 20 seconds in your test).
Here are my results using net/http
:
Running 1m test @ http://localhost:8080
3 threads and 33 connections
Latency Distribution (HdrHistogram - Recorded Latency)
50.000% 1.10ms
75.000% 1.51ms
90.000% 1.81ms
99.000% 2.30ms
99.900% 50.65ms
99.990% 66.69ms
99.999% 72.19ms
100.000% 72.45ms
Running 1m test @ http://localhost:8080
3 threads and 333 connections
Latency Distribution (HdrHistogram - Recorded Latency)
50.000% 1.52ms
75.000% 2.03ms
90.000% 2.69ms
99.000% 4.31ms
99.900% 60.26ms
99.990% 65.73ms
99.999% 67.39ms
100.000% 68.93ms
And my results using fasthttp
:
Running 1m test @ http://localhost:8080
3 threads and 33 connections
Latency Distribution (HdrHistogram - Recorded Latency)
50.000% 1.03ms
75.000% 1.42ms
90.000% 1.68ms
99.000% 2.15ms
99.900% 2.74ms
99.990% 37.15ms
99.999% 51.20ms
100.000% 60.77ms
Running 1m test @ http://localhost:8080
3 threads and 333 connections
Latency Distribution (HdrHistogram - Recorded Latency)
50.000% 1.19ms
75.000% 1.64ms
90.000% 2.05ms
99.000% 2.95ms
99.900% 7.01ms
99.990% 33.18ms
99.999% 46.94ms
100.000% 60.32ms
With 3 threads and 333 connections, Go performs almost like OCaml. But with 3 threads and 33 connections, Go still performs a lot worse than OCaml, even when using fasthttp
. This is the reason why I still think the main reason why we observe high latency is because the GC is incremental, except for the fact that scanning a single object (like the map[int][]byte
) is non-interruptible as of Go 1.6.2, which block other goroutines trying to mutate the scanned object at the same time.
Added the new reports and images. Seems like the running time of 60s might not be enough to accurately measure Go now at least for the 333 client version, so it might need to be increased to 5 minutes.
…influence on the test