Any limitations to this approach?

tedsuo commented 2 years ago

Hey, this is fabulous! :)

I'm wondering what (if any) fundamental limitations exist with this approach? For example:

Could auto-instrumentation be mixed with regular instrumentation code? Can auto-instrumentation register Providers that instrumentation code can make use of? Is the context object populated with the usual Otel data?
Going the other direction, could auto-instrumentation make use of Providers that are written in code and registered in the normal manner? For example, to allow custom SpanProcessors and other plugins to be installed.

Again, really awesome project, I'm excited to see where this goes!

edeNFed commented 2 years ago

Hi @tedsuo, I'm really happy to hear that you like this project 😄

Integration with manual Go instrumentation is something that we are definitely planning to do soon. In high level, I think we will try to implement several components in the OTel Go SDK that will communicate with the instrumentation agent via something like Unix domain socket. This way the manual instrumentation and automatic instrumentation can work together to produce data enriched by both. There are probably many use cases we still need to figure out but I think this is the overall direction.

Would be happy to hear what you think about this approach.

edeNFed commented 2 years ago

Hi @tedsuo, We just merged a design doc describing how we are going to implement integration between manual and automatic instrumentation. I'll be very happy to hear what you think about it 😄

tedsuo commented 2 years ago

Thanks for the heads up @edeNFed!

I'm glad to see that integration is possible. I have little knowledge of ebpf, so most of my comments will be pretty naive, but I do have a couple thoughts.

Rather than target SDK methods, I suggest targeting API methods. That way, users do not need to install an SDK, and you don't end up with the "double implementation" issue. If it is not possible to target the API directly (maybe ebpf can't target Go interfaces directly, only implementations?), then I suggest targeting the Noop implementation which comes with the API, and is registered by default.
Regarding context propagation: I suggest trying to store the spancontext in the context object, rather than a map of goroutines. This would allow parent/child relations to work correctly when the child spans are started on a different go routine, using a context that was passed to them over a channel. For example, a implementations which use a pool of goroutines to execute work would need to pass the parent context over the channel, so that the child spans can be linked properly. (Again, I could be missing something here about how you are linking into the runtime with ebpf, maybe you can track this automatically).
Another important concept in OTel are resources. It's critical to capture as many resource attributes as possible, before the SDK is loaded and the process begins to perform work. OTel has many resource detectors for common sources (kubernetes, AWS, linux, etc). If you haven't already, I suggest finding a way for the auto-sdk to be configured to detect these resources, and block on start until resource detection is complete.
Beyond resources, exporters, and propagators, there are all kinds of other processors, not to mention all the metrics, logging, and other features landing in OTel. If you are not already doing so, I would suggest looking at the C++ SDK implementation, and using it as your implementation under the hood. This could potentially save you a lot of work.

Hope that's helpful! Happy to answer any OTel-related questions that come up. :)

edeNFed commented 2 years ago

Thanks for the comment @tedsuo, this is really helpful! :)

Instrumenting the noop API methods sounds like a good solution. (Unfortunately, it is not possible to instrument interfaces using eBPF, only implementations).
I really like the idea of modifying the context object to hold the spancontext. The only downside that I can think of is that we lose the ability to instrument methods that don't include a context argument. For example, currently, we are able to create spans for database/sql driver invocations like db.Query("SELECT * FROM table"). Using context means we lose this ability. If database/sql is the rare exception this is probably a better solution.
Yes this is definitely on the roadmap. The current implementation is pretty naive only allowing configuring the resource name via OTEL_SERVICE_NAME env var.
The auto instrumentation uses the OTel Go SDK under the hood, we aim to offload as many as possible to the already implemented SDK logic. Basically, this project uses kernel events to invoke API methods like span and metrics creation, hopefully, everything else will be based on the existing implementation in the Go SDK.

I will update the design document to reflect the noop instrumentation. I'll be happy to hear if you think that being able to instrument only libraries with context-aware APIs is a better tradeoff than tracking goroutines?

odigos-io / opentelemetry-go-instrumentation

Any limitations to this approach? #11