Closed aermakov-zalando closed 7 years ago
@lmineiro Can you please check this? Thanks!
Maybe it's better to have a parameter that can change the default from UniformSample to expDecay and also a parameter to set the reservoir size for UniformSample.
sgtm
Benchmark: create N histograms with either a Uniform or ExpDecay sample, then update them all once per benchmark iteration. Results:
goos: darwin
goarch: amd64
pkg: github.bus.zalan.do/teapot/metrics-bench
BenchmarkUniform100-4 500000 9120 ns/op 0 B/op 0 allocs/op
BenchmarkUniform1000-4 50000 91968 ns/op 0 B/op 0 allocs/op
BenchmarkUniform10000-4 10000 903244 ns/op 0 B/op 0 allocs/op
BenchmarkUniform100000-4 1000 6693171 ns/op 0 B/op 0 allocs/op
BenchmarkExpDecay100-4 200000 22861 ns/op 0 B/op 0 allocs/op
BenchmarkExpDecay1000-4 10000 487840 ns/op 0 B/op 0 allocs/op
BenchmarkExpDecay10000-4 2000 5104737 ns/op 0 B/op 0 allocs/op
BenchmarkExpDecay100000-4 200 33383230 ns/op 0 B/op 0 allocs/op
PASS
ok github.bus.zalan.do/teapot/metrics-bench 56.337s
Memory use: ~8272 bytes/histogram for UniformSample
, ~16545 bytes/histogram for ExpDecaySample
.
Can you do this with N updates, where N is as close as possible to the reservoir capacity?
@lmineiro Updated 2048 times with random data (same for both benchmarks, though).
$ go test -bench=. -benchtime 10s -benchmem
goos: darwin
goarch: amd64
pkg: github.bus.zalan.do/teapot/metrics-bench
BenchmarkUniform100-4 1000 18655357 ns/op 0 B/op 0 allocs/op
BenchmarkExpDecay100-4 300 44768498 ns/op 0 B/op 0 allocs/op
PASS
ok github.bus.zalan.do/teapot/metrics-bench 38.606s
These benchmarks seem to confirm what we experienced before. The memory overhead and performance of the exponential decay are worse. It was for that reason that we went with the uniform. The implemented option seems reasonable and allows users to select. I'd definitely include these number this time in the docs so that users will know what they're getting into.
I'd say the overhead is negligible for both implementations, though, and having better metrics is worth a few more nano seconds per request. I recommend switching the default implementation after we're satisfied with the testing results.
I disagree that is it negligible. If you have a closer look, a single proxied request keeps and updates a lot of histograms. Not only that, the memory footprint can become overwhelming, like we realized before https://github.com/zalando/skipper/pull/446
I'd be really interested in adding a new reservoir which would combine the efficiency of the Uniform reservoir while applying a sliding window. I believe @aryszka has been working on something like this.
I think you refer to https://github.com/zalando/skipper/blob/master/circuit/binarysampler.go
this sliding window sampler would cost too much for arbitrary values and/or too large window sizes, i believe. It's used in the circuit breaker, where we only track binary values (success vs failure), with relatively small window sizes, and so it's enough to store only bits.
In 739f413, the metrics were changed to use a
UniformSample
with a hard-coded 1024 reservoir size. This makes latency metrics for services with non-uniform load very skewed. For example, if our service is under a very high load (100 rps) due to a misconfiguration in another service, and then the load drops to normal (1-2 rps) after the misconfiguration is fixed, it can take hours, days or even weeks for the metrics to go back to normal. Would you consider switching back to the default exponentially decaying sample? I can send a PR if necessary.