open-telemetry / opentelemetry-go-instrumentation

OpenTelemetry Auto Instrumentation using eBPF
https://opentelemetry.io
Apache License 2.0
550 stars 83 forks source link

Custom Probes #1105

Open MrAlias opened 2 months ago

MrAlias commented 2 months ago

There are many packages that are not instrumented by this project. Some open-source and others closed-source. There should be a way to configure a custom Probe definition that users can provide and would provide instrumentation for these unsupported packages.

RonFed commented 2 months ago

That's a great idea. I think one of the steps in getting there is to refine our Probe interface, maybe worth opening another issue for that? I have some ideas for improving the existing interface.

damemi commented 2 months ago

Yeah I think a refined, exposed Probe interface inherently provides the mechanism for custom probes. But calling out that that is intentional is good to do

MrAlias commented 2 months ago

That's a great idea. I think one of the steps in getting there is to refine our Probe interface, maybe worth opening another issue for that? I have some ideas for improving the existing interface.

I'm happy to have this issue be the place we discuss the needed probe changes. :+1:

RonFed commented 2 months ago

I'll try to list the main points I was thinking about:

Changes to the current functions

Load(*link.Executable, *process.TargetDetails, config.Sampler) error

Currently, this function has 2 main purposes: 1) loading the eBPF programs and maps into the kernel. 2) attaching the programs to the relevant symbols via uprobes.

Run(eventsChan chan<- *Event)

New functions suggestions

Status() Status

This will return the status of the probe (Running, closed, error) It will be useful for constructing a higher API which will report a general status of all the probes.

SupportedConfiguration() Config

Each probe can have a different set of configurations it supports (for example: collecting HTTP headers, including the url.path in span name, collecting db.query.text, sampling by kafka topic, ...). This function will alow each probe to declare which configuration it supports.

ApplyConfig(Config)

Changing the configuration of the probe, by applying the supplied one.

Different types of probes

Our current interface assumes each probe will report events. This is not necessarily the case. For example, in #954 we want to have a probe which will write some value to a boolean variable. This kind of probe acts as a Job - it has a dedicated task and it should not be running in the background. Maybe we want to have a base Probe interface, and 2 additional interfaces which will embed it: ReportingProbe and JobProbe?

damemi commented 2 months ago

@RonFed this is a great overview, thanks for writing that up

Different types of probes makes sense too. Could you generalize it even further to something like SyncProbe (does an action) and AsyncProbe (listens for events)?

damemi commented 2 months ago

From the sig meeting, my thinking of basing Probe config off the k8s scheduler framework (specifically the Plugin API)

If we have 2 types of probe ProbeA and ProbeB we could (roughly) have the following api:

So our root API would provide (something like)

// go.otel.io/auto/api
// Base types for probe and config
type Config interface {
  Package() string
}

type Probe interface {
  ApplyConfig(Config)
}

And we also require a standard New() function that's used by our auto-inst SDK to create the generic probe interface and call functions like Load() and Attach()

Then a user (or us) could import it and implement their custom probe

// github.com/me/foo-auto
// ProbeA implementations
type ProbeA interface {
  Probe
  DoProbeAThing()
}

type ProbeAConfig struct {
  CustomA string
  CustomSetting int
}

func (a *ProbeAConfig) Package() string {
  return a.CustomA
}

func New(otelauto.Config probeConfig) otelauto.Probe {
  probeAconfig := probeConfig.(*ProbeAConfig)
  doThing(probeAconfig.CustomSetting)
  return ProbeA{...}
}

There's some examples of this in place in this scheduler repo. I've just always thought it was a cool approach and think it fits here too

RonFed commented 2 months ago

@damemi I'm not sure I understood your suggestion. Assuming our current probe for simplicity, can you give an example of how configuration of HTTP or DB will look?

damemi commented 2 months ago

@RonFed I'll try to put together a more concrete prototype, I was looking into it and got kind of sidetracked just thinking about the overall structure.

On that note, I have some ideas of what the final layout could look like for the Probe api that might help as an end goal:

Something like:

import (
  "go.opentelemetry.io/auto"
  "go.opentelemetry.io/auto/probes/net/http"
  "github.com/damemi/customprobe"
)

func main() {
  inst, err := auto.NewInstrumentation(ctx,
    auto.WithProbes(
      http.New(),
      customprobe.New())
  )
...
}

Is this kind of what people were thinking for a public Probe api?

RonFed commented 2 months ago

@damemi Yes, that is what I was thinking about as well in terms of our end goal in this issue.

damemi commented 2 months ago

@RonFed sounds good, so to show what I'm saying I did an example with db/http here: https://github.com/open-telemetry/opentelemetry-go-instrumentation/pull/1123

Could you show what you mean for a SupportedConfig() method like you proposed? I might just be missing the use case for it, or maybe we're talking about the same thing

damemi commented 2 months ago

as part of this (and shifting Probes to NewInstrumentation()) we would also define a method signature for a probe's New() func that enforces passing probe.Config as an argument.

RonFed commented 2 months ago

@damemi The configuration example looks great to me. Is the package function in the config interface just an example? I agree with the need for having a config interface, just not sure about its functions. My initial idea was to have a common struct to all the probes (including some generic maps) - in that scenario, the SupportedConfig() function will return the keys of the map - letting the user know which configurations are available. However, the approach in your PR is better so that function is not necessary if we'll have all the configuration structs exported.

damemi commented 2 months ago

@RonFed yeah I just used Package() as an example of something that all Probes might need to expose to the Instrumentation machinery (in this case I used it just for logging).

Overall this is really pretty much exactly how the Collector does config for different components. It's kind of similar to the k8s scheduler like I mentioned.

But yes I think it's better to have the configuration exposed in a concrete type that's owned by the Probe itself, so that's the main idea with this approach

MrAlias commented 2 months ago

SIG meeting notes:

MrAlias commented 2 months ago

An interesting thought: with custom probes, this opens up the possibility to start instrumenting binaries other than Go (i.e. Rust, C++). Especially if the offsets are self contained in the probe.

RonFed commented 1 month ago

@damemi I think we can try and add the configuration layer similar to your example as a first step. This will need to be integrated with the definitions at provider.go WDYT?

MrAlias commented 2 weeks ago

The current Run and Close method is defined as follow:

type Probe interface {
    // ...
    Run(tracesChan chan<- ptrace.ScopeSpans)
    Close() error
}

Issues

The channel passed to Run is closed by another process after it calls Close

When a process wants to close probes it first calls Close and then when that call has returned it assumes Run is complete so it can close chan. The telemetryChan is not guaranteed to be unused though. There may be incorrect implementations or just concurrency patterns that mean something is still trying to be send on the telemetryChan.

Sending on a closed channel will cause a panic. Therefore, this design should be changed.

Instead of having Run receive a write-only channel, have it return a read-only channel it is writing to in a spawned goroutine. When Close is called, and the probe is sure it is done writing to the returned channel, it can close that channel and ensure no panic is raised.

The channel currently only supports trace telemetry

If the Probe produces log, metric, or other telemetry in the future it will need a forward compatible way to send this data.

Instead of having Run use a channel defined on ptrace.ScopeSpans, introduce a new probe.Telemetry type:

// Telemetry is the data read by a probe.
type Telemetry struct {
    ScopeSpans ptrace.ScopeSpans
}

This new type can have fields added for additional signals when they are supported.

Proposal

The Run method would become:

type Probe interface {
    // ...
    Run() <-chan Telemetry
    Close() error
}
MrAlias commented 2 weeks ago
// Telemetry is the data read by a probe.
type Telemetry struct {
  ScopeSpans ptrace.ScopeSpans
}

Having this include an error may also help communicate errors from the probe. For example:


// Telemetry is the data read by a probe.
type Telemetry struct {
    ScopeSpans ptrace.ScopeSpans

    // Error is the error encountered when reading the telemetry data.
    Error error
}
MrAlias commented 2 weeks ago

Proposal

The Run method would become:

type Probe interface {
  // ...
  Run() <-chan Telemetry
  Close() error
}

Based on https://github.com/open-telemetry/opentelemetry-go-instrumentation/pull/1248, this proposal should be revised to:

type Probe interface {
    // ...
    Run(func(Telemetry))
    Close() error
}
damemi commented 1 week ago

@MrAlias and I were talking about this last week, and I think the points we came up with for the design were (let me know if I missed anything):

Tyler suggested a factory pattern for this, and I agree that would probably check all the boxes. I can work on a POC for that this week. I'd like to implement this in a way that first wraps the current Probes so that we can transition one by one in smaller PRs.

MrAlias commented 1 week ago

The factory pattern will be good to start internally. To fully support this from external users we will need to support multi-target running (https://github.com/open-telemetry/opentelemetry-go-instrumentation/issues/197).

RonFed commented 6 days ago

I think the Probe interface should have a way of stating which versions it supports - This will help with cases like https://github.com/open-telemetry/opentelemetry-go-instrumentation/pull/1302

MrAlias commented 6 days ago

I think the Probe interface should have a way of stating which versions it supports - This will help with cases like #1302

Is this something you see the Manifest returned from a probe handling?

RonFed commented 6 days ago

Is this something you see the Manifest returned from a probe handling?

Yea, that makes sense. We can add library version support and Go version support as part of the Manifest. That way the manager can figure out what can be loaded.