Open gberche-orange opened 10 months ago
I will admit to being somewhat reluctant to add otel to kine. We already struggle with otel library compatibility in k3s between what is needed by kubernetes, etcd, and containerd, and have in the past had to maintain parallel kubernetes-minor-specific branches in order to resolve dependency conflicts. I am concerned that adding otel here will require similar juggling.
Are you not able to get the information you need from the Kubernetes apiserver's etcd client otel traces?
Are you not able to get the information you need from the Kubernetes apiserver's etcd client otel traces?
Sorry for late follow up on this. We're struggling with a lengthy diagnostic of performance and stability issue in our use-case around kine multi-server multi-az cluster. We're currently blocked with the otel instrumentation of the client k8s api server etcd. I hope I'll be able to share sample traces when we managed to get past the environment problems.
User expectation
As a kine user
Observed behavior
Currently, otel frameworks such as coroot can only observe kine using eBPF instrumentation and lack SQL details
See details of the eBBF collected traces in the screen PDF print: Coroot-on-kine.pdf
Also the current prometheus metrics published by kine don't include sql statements nor etcd api calls received. See some sample metrics at https://gist.github.com/gberche-orange/32020e5fd00475d678eda04dec066955
Possible fix
For postgresql specifically, since kine uses pgx https://github.com/k3s-io/kine/blob/f7ae7ce70751a7eab4b40574b45fc0cfa7be15fc/pkg/drivers/pgsql/pgsql.go#L14 this may require to import and use https://github.com/exaring/otelpgx
Which would add otel traces of sql statement (see https://github.com/exaring/otelpgx/blob/main/tracer_test.go ) to the default ebpf instrumentation
Background
More background about otel go instrumentation and list of instrumented go libraries
See https://github.com/exaring/otelpgx/issues/27 for additional metrics to be collected by otelgpx
Workaround
The instrumentation on the database is only providing server-side perception, and may not properly reflect kine client side perception (typically latency)