lightstep / lightstep-tracer-go

The Lightstep distributed tracing library for Go
https://lightstep.com
MIT License
98 stars 54 forks source link

latency issue caused by MaxBufferedSpans Option #273

Closed szuecs closed 3 years ago

szuecs commented 3 years ago

In our http proxy https://github.com/zalando/skipper we see an increase in Heap object creation that causes GC p75 slightly above 1ms and this seems to cause latency p999 <=25ms to be impossible. See also the blue line in the graph that changed close to 1/28 and is ok again around 2/9. The only option I changed was MaxBufferedSpans option set from 32768 back to 8192. We had 8192 before 1/28. On 1/28 we also changed the tracer library from v0.16.0 to v0.24.0. We had to increase MaxBufferedSpans setting, because there was 20-30% client side Span drops.

image image image

We built skipper with Go v1.15.7 and lightstep-tracer-go v0.24.0.

I hoped the tracing buffer would be a pre-allocated ring buffer, that would consume only a fixed amount of memory without creating garbage.

szuecs commented 3 years ago

I have another occurrence. Interesting is that this has only 400 RPS, which is basically nothing. And instances differ with close to 10x CPU and memory consumption:

CPU usage per instance: image

Memory allocations per instance: image

memory profile: image

Another instance that does not have the same problem, yet: image

szuecs commented 3 years ago

I copy #276 into this one, because it's the same problem.

original issue

We run skipper with deployment+hpa as ingress controller in Kubernetes. In front of this there is an ALB distributing the traffic to the skipper. Maybe related to #273.

kubectl top pods -l application=skipper-ingress 
NAME                               CPU(cores)   MEMORY(bytes)
...
skipper-ingress-649d68964f-kmx5n   386m         105Mi
...
skipper-ingress-649d68964f-xjlhv   1981m        167Mi

I wondered what caused the difference. And a profile shows this for the skipper-ingress-649d68964f-kmx5n: image

skipper-ingress-649d68964f-xjlhv image

The difference is that skipper-ingress-649d68964f-xjlhv creates a lot of garbage in the tracer and has to clean it up again.

szuecs commented 3 years ago

Btw. this seems not to be related to Go version, 1.16 has the same issue and I also tried GODEBUG=madvdontneed=0 which did not help either.

szuecs commented 3 years ago

Maybe also interesting the flamegraph image