Closed szuecs closed 3 years ago
I have another occurrence. Interesting is that this has only 400 RPS, which is basically nothing. And instances differ with close to 10x CPU and memory consumption:
CPU usage per instance:
Memory allocations per instance:
memory profile:
Another instance that does not have the same problem, yet:
I copy #276 into this one, because it's the same problem.
We run skipper with deployment+hpa as ingress controller in Kubernetes. In front of this there is an ALB distributing the traffic to the skipper. Maybe related to #273.
kubectl top pods -l application=skipper-ingress
NAME CPU(cores) MEMORY(bytes)
...
skipper-ingress-649d68964f-kmx5n 386m 105Mi
...
skipper-ingress-649d68964f-xjlhv 1981m 167Mi
I wondered what caused the difference.
And a profile shows this for the skipper-ingress-649d68964f-kmx5n:
skipper-ingress-649d68964f-xjlhv
The difference is that skipper-ingress-649d68964f-xjlhv creates a lot of garbage in the tracer and has to clean it up again.
Btw. this seems not to be related to Go version, 1.16 has the same issue and I also tried GODEBUG=madvdontneed=0
which did not help either.
Maybe also interesting the flamegraph
In our http proxy https://github.com/zalando/skipper we see an increase in Heap object creation that causes GC p75 slightly above 1ms and this seems to cause latency p999 <=25ms to be impossible. See also the blue line in the graph that changed close to 1/28 and is ok again around 2/9. The only option I changed was MaxBufferedSpans option set from 32768 back to 8192. We had 8192 before 1/28. On 1/28 we also changed the tracer library from v0.16.0 to v0.24.0. We had to increase MaxBufferedSpans setting, because there was 20-30% client side Span drops.
We built skipper with Go v1.15.7 and lightstep-tracer-go v0.24.0.
I hoped the tracing buffer would be a pre-allocated ring buffer, that would consume only a fixed amount of memory without creating garbage.