Open MrAlias opened 2 months ago
That's a great idea. I think one of the steps in getting there is to refine our Probe interface, maybe worth opening another issue for that? I have some ideas for improving the existing interface.
Yeah I think a refined, exposed Probe interface inherently provides the mechanism for custom probes. But calling out that that is intentional is good to do
That's a great idea. I think one of the steps in getting there is to refine our Probe interface, maybe worth opening another issue for that? I have some ideas for improving the existing interface.
I'm happy to have this issue be the place we discuss the needed probe changes. :+1:
I'll try to list the main points I was thinking about:
Load(*link.Executable, *process.TargetDetails, config.Sampler) error
Currently, this function has 2 main purposes: 1) loading the eBPF programs and maps into the kernel. 2) attaching the programs to the relevant symbols via uprobes.
Load
and Attach
. This separation might help make the code clearer, better error handling, and some added functionality (loading some program once and attaching it to different places in runtime).Attach
function. In addition, the general configuration struct will need to be decided instead of passing the sampling config alone.context
here.Run(eventsChan chan<- *Event)
pdata
as was suggested.context
here? currently the cancelation of this function is done by closing the perf buffer which causes the reading loop to exit.Status() Status
This will return the status of the probe (Running, closed, error) It will be useful for constructing a higher API which will report a general status of all the probes.
SupportedConfiguration() Config
Each probe can have a different set of configurations it supports (for example: collecting HTTP headers, including the url.path in span name, collecting db.query.text, sampling by kafka topic, ...). This function will alow each probe to declare which configuration it supports.
ApplyConfig(Config)
Changing the configuration of the probe, by applying the supplied one.
Our current interface assumes each probe will report events. This is not necessarily the case. For example, in #954 we want to have a probe which will write some value to a boolean variable. This kind of probe acts as a Job
- it has a dedicated task and it should not be running in the background. Maybe we want to have a base Probe interface, and 2 additional interfaces which will embed it: ReportingProbe
and JobProbe
?
@RonFed this is a great overview, thanks for writing that up
Load
and Attach
. I think having atomic operations is much clearer and allows for more extension points. We can always have a LoadAndAttach
later for syntactic sugar if it's that convenient for usersDifferent types of probes makes sense too. Could you generalize it even further to something like SyncProbe
(does an action) and AsyncProbe
(listens for events)?
From the sig meeting, my thinking of basing Probe config off the k8s scheduler framework (specifically the Plugin API)
If we have 2 types of probe ProbeA
and ProbeB
we could (roughly) have the following api:
So our root API would provide (something like)
// go.otel.io/auto/api
// Base types for probe and config
type Config interface {
Package() string
}
type Probe interface {
ApplyConfig(Config)
}
And we also require a standard New()
function that's used by our auto-inst SDK to create the generic probe interface and call functions like Load() and Attach()
Then a user (or us) could import it and implement their custom probe
// github.com/me/foo-auto
// ProbeA implementations
type ProbeA interface {
Probe
DoProbeAThing()
}
type ProbeAConfig struct {
CustomA string
CustomSetting int
}
func (a *ProbeAConfig) Package() string {
return a.CustomA
}
func New(otelauto.Config probeConfig) otelauto.Probe {
probeAconfig := probeConfig.(*ProbeAConfig)
doThing(probeAconfig.CustomSetting)
return ProbeA{...}
}
There's some examples of this in place in this scheduler repo. I've just always thought it was a cool approach and think it fits here too
@damemi I'm not sure I understood your suggestion. Assuming our current probe for simplicity, can you give an example of how configuration of HTTP or DB will look?
@RonFed I'll try to put together a more concrete prototype, I was looking into it and got kind of sidetracked just thinking about the overall structure.
On that note, I have some ideas of what the final layout could look like for the Probe api that might help as an end goal:
/internal/pkg/instrumentation/probe
to a root /pkg/probe
go.opentelemetry.io/auto/pkg/probe
Probe
API decoupled from the entire Instrumentation
root package so that custom probes can be defined in isolation and imported without depending on the full auto-instrumentation SDK. In other words, users (and us) should be able to write a Probe that is entirely its own module (similar to collector-contrib)/internal/pkg/instrumentation/bpf
to a root /probes
go.opentelemetry.io/auto/probes/net/http
NewInstrumentation()
Something like:
import (
"go.opentelemetry.io/auto"
"go.opentelemetry.io/auto/probes/net/http"
"github.com/damemi/customprobe"
)
func main() {
inst, err := auto.NewInstrumentation(ctx,
auto.WithProbes(
http.New(),
customprobe.New())
)
...
}
Is this kind of what people were thinking for a public Probe api?
@damemi Yes, that is what I was thinking about as well in terms of our end goal in this issue.
@RonFed sounds good, so to show what I'm saying I did an example with db/http here: https://github.com/open-telemetry/opentelemetry-go-instrumentation/pull/1123
Could you show what you mean for a SupportedConfig()
method like you proposed? I might just be missing the use case for it, or maybe we're talking about the same thing
as part of this (and shifting Probes to NewInstrumentation()
) we would also define a method signature for a probe's New()
func that enforces passing probe.Config
as an argument.
@damemi The configuration example looks great to me.
Is the package
function in the config interface just an example? I agree with the need for having a config interface, just not sure about its functions.
My initial idea was to have a common struct to all the probes (including some generic maps) - in that scenario, the SupportedConfig()
function will return the keys of the map - letting the user know which configurations are available.
However, the approach in your PR is better so that function is not necessary if we'll have all the configuration structs exported.
@RonFed yeah I just used Package()
as an example of something that all Probes might need to expose to the Instrumentation machinery (in this case I used it just for logging).
Overall this is really pretty much exactly how the Collector does config for different components. It's kind of similar to the k8s scheduler like I mentioned.
But yes I think it's better to have the configuration exposed in a concrete type that's owned by the Probe itself, so that's the main idea with this approach
SIG meeting notes:
structfield.ID
in the Manifest.
Run
method needs to be updated to transport ptrace.Traces
An interesting thought: with custom probes, this opens up the possibility to start instrumenting binaries other than Go (i.e. Rust, C++). Especially if the offsets are self contained in the probe.
@damemi I think we can try and add the configuration layer similar to your example as a first step.
This will need to be integrated with the definitions at provider.go
WDYT?
The current Run
and Close
method is defined as follow:
type Probe interface {
// ...
Run(tracesChan chan<- ptrace.ScopeSpans)
Close() error
}
Run
is closed by another process after it calls Close
When a process wants to close probes it first calls Close
and then when that call has returned it assumes Run
is complete so it can close chan
. The telemetryChan
is not guaranteed to be unused though. There may be incorrect implementations or just concurrency patterns that mean something is still trying to be send on the telemetryChan
.
Sending on a closed channel will cause a panic. Therefore, this design should be changed.
Instead of having Run
receive a write-only channel, have it return a read-only channel it is writing to in a spawned goroutine. When Close
is called, and the probe is sure it is done writing to the returned channel, it can close that channel and ensure no panic is raised.
If the Probe
produces log, metric, or other telemetry in the future it will need a forward compatible way to send this data.
Instead of having Run
use a channel defined on ptrace.ScopeSpans
, introduce a new probe.Telemetry
type:
// Telemetry is the data read by a probe.
type Telemetry struct {
ScopeSpans ptrace.ScopeSpans
}
This new type can have fields added for additional signals when they are supported.
The Run
method would become:
type Probe interface {
// ...
Run() <-chan Telemetry
Close() error
}
// Telemetry is the data read by a probe. type Telemetry struct { ScopeSpans ptrace.ScopeSpans }
Having this include an error
may also help communicate errors from the probe. For example:
// Telemetry is the data read by a probe.
type Telemetry struct {
ScopeSpans ptrace.ScopeSpans
// Error is the error encountered when reading the telemetry data.
Error error
}
Proposal
The
Run
method would become:type Probe interface { // ... Run() <-chan Telemetry Close() error }
Based on https://github.com/open-telemetry/opentelemetry-go-instrumentation/pull/1248, this proposal should be revised to:
type Probe interface {
// ...
Run(func(Telemetry))
Close() error
}
@MrAlias and I were talking about this last week, and I think the points we came up with for the design were (let me know if I missed anything):
Tyler suggested a factory pattern for this, and I agree that would probably check all the boxes. I can work on a POC for that this week. I'd like to implement this in a way that first wraps the current Probes so that we can transition one by one in smaller PRs.
The factory pattern will be good to start internally. To fully support this from external users we will need to support multi-target running (https://github.com/open-telemetry/opentelemetry-go-instrumentation/issues/197).
I think the Probe interface should have a way of stating which versions it supports - This will help with cases like https://github.com/open-telemetry/opentelemetry-go-instrumentation/pull/1302
I think the Probe interface should have a way of stating which versions it supports - This will help with cases like #1302
Is this something you see the Manifest
returned from a probe handling?
Is this something you see the Manifest returned from a probe handling?
Yea, that makes sense. We can add library version support and Go version support as part of the Manifest. That way the manager can figure out what can be loaded.
There are many packages that are not instrumented by this project. Some open-source and others closed-source. There should be a way to configure a custom
Probe
definition that users can provide and would provide instrumentation for these unsupported packages.