open-telemetry / opentelemetry-go-instrumentation

OpenTelemetry Auto Instrumentation using eBPF
https://opentelemetry.io
Apache License 2.0
498 stars 77 forks source link

Improve cardinality of server spans for better segmentation #627

Open SophieDeBenedetto opened 8 months ago

SophieDeBenedetto commented 8 months ago

Is your feature request related to a problem? Please describe. GitHub has been evaluating running the ebpf sidecar container provided by opentelemetry-go-instrumentation as a solution for auto-instrumenting our Golang services. After deploying our POC, we ultimately found the server spans provided by the opentelemetry-go-instrumentation probes to be too generic for us to find useful. We would also love to see more probes developed for additional Golang dependencies beyond what's present here. We're hoping to eventually engage GitHub engineers in contributing to this project, so we're starting off with this issue to describe the work we feel would benefit our use-case.

I'll provide a little more detail here on what I mean by "generic" spans.

// pkg/instrumentation/bpf/database/sql/probe.go#L174
return &probe.SpanEvent{
        SpanName:    "DB",
        StartTime:   int64(e.StartTime),
        EndTime:     int64(e.EndTime),
        SpanContext: &sc,
        Attributes: []attribute.KeyValue{
            semconv.DBStatementKey.String(query),
        },
        ParentSpanContext: pscPtr,
    }

We'd benefit from support for templatized parameters here to allow span names to be something like DB <operation, to match the db.operation OTel semconv span name. This allows us to at least segment trace data and RED metrics derived from traces by database operation at a resource level.

Similarly, HTTP server spans are named with the HTTP method only, ref:

// pkg/instrumentation/bpf/net/http/server/probe.go#L204
    return &probe.SpanEvent{
        // Do not include the high-cardinality path here (there is no
        // templatized path manifest to reference).
        SpanName:          method,
        StartTime:         int64(e.StartTime),
        EndTime:           int64(e.EndTime),
        SpanContext:       &sc,
        ParentSpanContext: pscPtr,
        Attributes:        attributes,
    }

While I can see the comment here about the concern around span name cardinality, grouping all traces for a given service by resources segmented on HTTP method only is too blunt for us. We want to see the behavior of specific endpoints in a given service. We'd be interested in expanding this behavior to allow for templatized path parameters to be included in HTTP server span names in a way that is secure/doesn't leak PII.

Describe the solution you'd like There's a few avenues we'd like to explore with maintainers here:

Additional context I can't commit that GitHub engineers would definitely engage here at this time, but we're trying to generate some interest internally in contributing to this project so I'm hoping this issue can spark some conversation on the topics I've outlined here/any ideas to address these concerns.

Thank you!

cc @arielvalentin who has been working on this internally at GitHub with me 😄

RonFed commented 8 months ago

Hi @SophieDeBenedetto, thank you for this issue, it contains some great points. I think we can improve on the DB and HTTP span names.

SophieDeBenedetto commented 8 months ago

Hi @RonFed and thanks for your response here! I'll share some thoughts on your points one by one:

But the challenge of how to detect a low-cardinality http.route remains. One thing I'm thinking of is the span name formatters that [opentelemetry-go-contrib](https://github.com/open-telemetry/opentelemetry-go-contrib/blob/main/instrumentation/net/http/otelhttp/config.go#L44) makes available for users of that package. This allows users to provide their own custom span name formatter to implement logic that could scrub PII data from routes for HTTP span names, and/or recognize high-cardinality vs. low-cardinality routes for span names. Has any thought been given to exposing some similar level that users can pull with using ebpf auto-instrumentation? I confess I don't know what that would take to make a reality.

Thanks again for engaging with us here and looking forward to hearing any additional thoughts you may want to share!