open-telemetry / opentelemetry-go-instrumentation

OpenTelemetry Auto Instrumentation using eBPF
https://opentelemetry.io
Apache License 2.0
482 stars 73 forks source link

Custom SDK implementation #954

Open MrAlias opened 1 month ago

MrAlias commented 1 month ago

Initial support for interoperability with manual instrumentation was added in https://github.com/open-telemetry/opentelemetry-go-instrumentation/pull/523. This approach has issues when trying to implement methods like IsRecording, RecordError, and AddEvent, as well as the inability to capture the SpanStartOptions which contain important definitions about the span.

Provide our own SDK

An alternate solution for this interoperability was proposed in https://github.com/open-telemetry/opentelemetry-go-instrumentation/issues/352#issuecomment-1811166718:

Provide a TracerProvider implementation from auto-instrumentation that manual instrumentation can explicitly use to say they want their data to go to the same pipeline. This means the auto-instrumentation will control the type that manual instrumentation uses to create, edit, and end spans if they want the data to be sent to the auto-instrumentation pipeline

Providing an SDK implementation like this would allow us to design minimal overhead in the Go implementation to resolve all the issues found with the current approach. Namely:

Interoperability

The remaining issues with providing a custom SDK, is how this would be integrated into users code?

The original idea was to offer this SDK as a package under go.opentelemetry.io/auto. This could still be provided, but if this is the only way this solution is integrated it means user code needs to be updated to inter-op with auto-instrumentation.

Requiring user code to be updated to work with auto-instrumentation is not desired. The whole point of auto-instrumentation is this kind of action would not be required.

Instead, we would like to have the default global otel implementation inter-op with this new SDK:

Action Items

RonFed commented 1 month ago

@MrAlias What do you think should be included in the prototype?

MrAlias commented 1 month ago

Ideally, something like this:

package main

import (
    "fmt"
    "time"
)

var Flag bool

func main() {
    for {
        fmt.Printf("Flag: %t", Flag)
        time.Sleep(time.Second)
    }
    // Output: Flag: false
    // Auto-instrumentation added
    // Output: Flag: true
}
MrAlias commented 1 month ago

We may also need to look into having the global package call something like:

func (t *tracer) start(eBPFFlag *bool, ctx context.Context, name string, opts ...trace.SpanStartOption) (context.Context, trace.Span) {
/* ... */
}

and instrument that instead of Start so we can find the flag location.

RonFed commented 1 month ago

@MrAlias I did a POC for the second option we talked about. It is in https://github.com/open-telemetry/opentelemetry-go-instrumentation/compare/main...RonFed:opentelemetry-go-instrumentation_fork:flag_POC

Here are some logs from the app being instrumented and the agent:

rolldice-1 | 2024-08-09T15:42:14.291Z INFO app/main.go:64 starting http server {"port": ":8080"} rolldice-1 | 2024-08-09T15:42:14.291Z INFO app/main.go:52 probed function called with flag unset go-auto-1 | {"level":"info","ts":1723218134.4202611,"logger":"go.opentelemetry.io/auto","caller":"cli/main.go:86","msg":"building OpenTelemetry Go instrumentation ...","globalImpl":false} rolldice-1 | 2024-08-09T15:42:15.295Z INFO app/main.go:52 probed function called with flag unset rolldice-1 | 2024-08-09T15:42:16.297Z INFO app/main.go:52 probed function called with flag unset go-auto-1 | {"level":"info","ts":1723218136.4233193,"logger":"Instrumentation.Analyzer","caller":"process/discover.go:67","msg":"found process","pid":12030} go-auto-1 | {"level":"info","ts":1723218136.4287593,"logger":"Instrumentation.Allocate","caller":"process/allocate.go:73","msg":"Detaching from process","pid":12030} go-auto-1 | {"level":"info","ts":1723218136.4289303,"logger":"Instrumentation","caller":"app/instrumentation.go:144","msg":"target process analysis completed","pid":12030,"go_version":"1.22.5","dependencies":{"go.uber.org/multierr":"1.10.0","go.uber.org/zap":"1.27.0","std":"1.22.5"},"total_functions_found":3} go-auto-1 | {"level":"info","ts":1723218136.428993,"logger":"go.opentelemetry.io/auto","caller":"cli/main.go:119","msg":"starting instrumentation..."} go-auto-1 | {"level":"info","ts":1723218136.4319375,"logger":"Instrumentation.Manager","caller":"instrumentation/manager.go:198","msg":"loading probe","name":"net/http/server"} go-auto-1 | {"level":"info","ts":1723218136.4832294,"logger":"Instrumentation.Manager","caller":"instrumentation/manager.go:198","msg":"loading probe","name":"main/client"} go-auto-1 | {"level":"info","ts":1723218136.4842162,"logger":"go.opentelemetry.io/auto","caller":"cli/main.go:115","msg":"instrumentation loaded successfully"} rolldice-1 | 2024-08-09T15:42:17.299Z INFO app/main.go:50 probed function called with flag set rolldice-1 | 2024-08-09T15:42:18.300Z INFO app/main.go:50 probed function called with flag set rolldice-1 | 2024-08-09T15:42:19.301Z INFO app/main.go:50 probed function called with flag set rolldice-1 | 2024-08-09T15:42:20.302Z INFO app/main.go:50 probed function called with flag set rolldice-1 | 2024-08-09T15:42:21.303Z INFO app/main.go:50 probed function called with flag set rolldice-1 | 2024-08-09T15:42:22.304Z INFO app/main.go:50 probed function called with flag set rolldice-1 | 2024-08-09T15:42:23.305Z INFO app/main.go:50 probed function called with flag set rolldice-1 | 2024-08-09T15:42:24.307Z INFO app/main.go:50 probed function called with flag set rolldice-1 | 2024-08-09T15:42:25.308Z INFO app/main.go:50 probed function called with flag set rolldice-1 | 2024-08-09T15:42:26.309Z INFO app/main.go:50 probed function called with flag set

MrAlias commented 1 month ago

Awesome! Let me see about getting an issue created in https://github.com/open-telemetry/opentelemetry-go to make the proposal based on these findings.

MrAlias commented 1 month ago

opentelemetry-go proposal: https://github.com/open-telemetry/opentelemetry-go/issues/5702

@RonFed PTAL

RonFed commented 1 month ago

@MrAlias LGTM, In the suggested change, do you think we should probe the tracer.newSpan (which has the flag pointer) or the interop.newSpan (which has the configuration ready)? Ideally, we should build it in a way that we can handle both in a single eBPF program if it is possible.

MrAlias commented 1 month ago

@MrAlias LGTM, In the suggested change, do you think we should probe the tracer.newSpan (which has the flag pointer) or the interop.newSpan (which has the configuration ready)? Ideally, we should build it in a way that we can handle both in a single eBPF program if it is possible.

I was thinking tracer.newSpan as that would allow us to set the bool and trace in one probe. Right?

RonFed commented 1 month ago

I was thinking tracer.newSpan as that would allow us to set the bool and trace in one probe. Right?

Yea, that sounds good, we'll need to have a return probe for that function as well simillar to what we have today - in order to save the returned span pointer in a map.

MrAlias commented 3 weeks ago

I was thinking tracer.newSpan as that would allow us to set the bool and trace in one probe. Right?

Yea, that sounds good, we'll need to have a return probe for that function as well simillar to what we have today - in order to save the returned span pointer in a map.

Correct.

damemi commented 1 hour ago

With the custom sdk we're providing, do we still need the probes for the global instrumentation that we already have? Or could we just implement those functions in our sdk (Start, End, SetAttributes, etc)?

In that case we would only need to auto-instrument the custom sdk's ended function so that the object gets passed to the auto-agent's export pipeline.

MrAlias commented 1 hour ago

With the custom sdk we're providing, do we still need the probes for the global instrumentation that we already have? Or could we just implement those functions in our sdk (Start, End, SetAttributes, etc)?

In that case we would only need to auto-instrument the custom sdk's ended function so that the object gets passed to the auto-agent's export pipeline.

Correct. After we have opentelemetry-go use our SDK we won't need to instrument the nonRecordingSpan or tracer from go.openetelemetry.io/otel/internal/global.

MrAlias commented 1 hour ago

We will still need to instrument the initial call to tracer.Start to set the bool there stating the auto-instrumentation is registered.

MrAlias commented 58 minutes ago

We will still need to instrument the initial call to tracer.Start to set the bool there stating the auto-instrumentation is registered.

Hmm, we could also look into just instrumenting a bool in our SDK that the global SDK would look up via our API. Might be something we should look into.