solo-io / gloo

The Feature-rich, Kubernetes-native, Next-Generation API Gateway Built on Envoy
https://docs.solo.io/
Apache License 2.0
4.06k stars 432 forks source link

Set up open telemetry, beef up tracing story, and add developer docs #7286

Open kdorosh opened 1 year ago

kdorosh commented 1 year ago

Version

1.13.x (beta)

Is your feature request related to a problem? Please describe.

While profiling gloo at scale, it was important to see which functions most of the time was spent in to optimize time from resource application to being reflected in the dataplane. Also important to see which goroutines were blocked on full channels, and for how long.

we can collect noisy traces using curl -o trace.out 'http://localhost:9091/debug/pprof/trace?seconds=5' and then go tool trace trace.out to inspect. we should add this to developer docs, and clean up our spans so the output of this is more usable.

Describe the solution you'd like

we have spans in the code, but should move to completely standard opentelemetry for golang and instrument all key parts of the codebase:

Describe alternatives you've considered

n/a

Additional Context

This is also super important for collecting useful data in customer environments where we cannot replicate their setup identically but want to make solid recommendations.

nfuden commented 1 year ago

We should consider using Open telemetry instead