k3s-io / kine

Run Kubernetes on MySQL, Postgres, sqlite, dqlite, not etcd.
Apache License 2.0
1.49k stars 226 forks source link

Opentelemetry instrumentation for sql drivers including postgresql #262

Open gberche-orange opened 6 months ago

gberche-orange commented 6 months ago

User expectation

As a kine user

Observed behavior

Currently, otel frameworks such as coroot can only observe kine using eBPF instrumentation and lack SQL details

coroot-kine

See details of the eBBF collected traces in the screen PDF print: Coroot-on-kine.pdf

Also the current prometheus metrics published by kine don't include sql statements nor etcd api calls received. See some sample metrics at https://gist.github.com/gberche-orange/32020e5fd00475d678eda04dec066955

Possible fix

For postgresql specifically, since kine uses pgx https://github.com/k3s-io/kine/blob/f7ae7ce70751a7eab4b40574b45fc0cfa7be15fc/pkg/drivers/pgsql/pgsql.go#L14 this may require to import and use https://github.com/exaring/otelpgx

Create the tracer as part of your connection:

cfg, err := pgxpool.ParseConfig(connString)
if err != nil {
    return nil, fmt.Errorf("create connection pool: %w", err)
}

cfg.ConnConfig.Tracer = otelpgx.NewTracer()

conn, err := pgxpool.NewWithConfig(ctx, cfg)
if err != nil {
    return nil, fmt.Errorf("connect to database: %w", err)
}

Which would add otel traces of sql statement (see https://github.com/exaring/otelpgx/blob/main/tracer_test.go ) to the default ebpf instrumentation

Background

More background about otel go instrumentation and list of instrumented go libraries

See https://github.com/exaring/otelpgx/issues/27 for additional metrics to be collected by otelgpx

Workaround

The instrumentation on the database is only providing server-side perception, and may not properly reflect kine client side perception (typically latency)

brandond commented 6 months ago

I will admit to being somewhat reluctant to add otel to kine. We already struggle with otel library compatibility in k3s between what is needed by kubernetes, etcd, and containerd, and have in the past had to maintain parallel kubernetes-minor-specific branches in order to resolve dependency conflicts. I am concerned that adding otel here will require similar juggling.

Are you not able to get the information you need from the Kubernetes apiserver's etcd client otel traces?

gberche-orange commented 5 months ago

Are you not able to get the information you need from the Kubernetes apiserver's etcd client otel traces?

Sorry for late follow up on this. We're struggling with a lengthy diagnostic of performance and stability issue in our use-case around kine multi-server multi-az cluster. We're currently blocked with the otel instrumentation of the client k8s api server etcd. I hope I'll be able to share sample traces when we managed to get past the environment problems.