Open Dieterbe opened 8 years ago
test with https://gist.github.com/Dieterbe/bda3f2af50c56146e98580a03c2b6eaa applied to raintank-docker to auto-apply realistic workload results in: sys https://snapshot.raintank.io/dashboard/snapshot/1zc8flsQTV4pyjOv6fIm5BXH3eird4kF MT https://snapshot.raintank.io/dashboard/snapshot/hvtuSiLV0CDtJy31zWdDKQ2ZQWQOW1VI (GC~spikes correlation visible on duration chart)
vegeta:
cat attack.out | vegeta report 2>&1 | egrep -v 'connection reset|timed out|timeout'
Requests [total, rate] 60000, 200.00
Duration [total, attack, wait] 5m26.855960507s, 4m59.994999853s, 26.860960654s
Latencies [mean, 50, 95, 99, max] 17.006800796s, 13.661618151s, 43.008771756s, 53.01141618s, 2m7.401387905s
Bytes In [total, mean] 1756638977, 29277.32
Bytes Out [total, mean] 0, 0.00
Success [ratio] 68.44%
Status Codes [code:count] 200:41066 0:18934
Error Set:
root@benchmark:/opt/raintank/raintank-tsdb-benchmark# cat attack.out | vegeta report 2>&1 | egrep -c 'connection reset|timed out|timeout'
2163
new sys https://snapshot.raintank.io/dashboard/snapshot/i9TIko5tB522Wh8RQVgR7BG6BZmjmFna new MT https://snapshot.raintank.io/dashboard/snapshot/wFketZbpnZUZjxg1QbIZmNNJgSoJdnUn
vegeta:
cat vegeta-after
root@benchmark:/opt/raintank/raintank-tsdb-benchmark# cat attack.out | vegeta report 2>&1 | egrep -v 'connection reset|timed out|timeout'
Requests [total, rate] 60000, 200.00
Duration [total, attack, wait] 5m42.394862196s, 4m59.994999882s, 42.399862314s
Latencies [mean, 50, 95, 99, max] 17.811976008s, 14.105108138s, 43.010377182s, 53.013462631s, 1m14.607911294s
Bytes In [total, mean] 1677464219, 27957.74
Bytes Out [total, mean] 0, 0.00
Success [ratio] 66.55%
Status Codes [code:count] 200:39932 0:20068
Error Set:
root@benchmark:/opt/raintank/raintank-tsdb-benchmark# cat attack.out | vegeta report 2>&1 | egrep -c 'connection reset|timed out|timeout'
2253
=> my test was probably using too many req/s or something. it seemed graphite-api itself had issues keeping up, however we can still tell what we need to tell: => no discernable change. similar latency spikes at GC runs
confirmed again using latest golang master, which includes Austin's fix.
latest master has GC changes that should help
a fix was merged in Go for https://github.com/golang/go/issues/16293 : https://github.com/golang/go/commit/cf4f1d07a189125a8774a923a3259126599e942b , this has shown good results for large maps (see also https://github.com/spion/hashtable-latencies/issues/13). It will likely fix our issue as well. We just need to test it. Only problem is it's in git master, and there most likely won't be a 1.7.x release for it so we have to use go from git master and/or wait for 1.8
Is it reasonable to cherry-pick that fix onto 1.7.1 ?
i'll just run a bench in raintank-docker. now is especially a good time because of https://groups.google.com/forum/m/#!topic/golang-dev/Ab1sFeoZg_8 also
i'm going to look into techniques to lower GC cpu overhead. we currently reference a lot of data through pointers, i suspect we may be able to lower GC quite a bit by being smarter about this.