[Proposal] opentelemetry-go-auto-instrumentation

D-D-H commented 9 months ago

Description

The opentelemetry-go-auto-instrumentation project is an auto-instrumentation solution designed for Go applications. It empowers users to harness the capabilities of OpenTelemetry for enhanced observability without any manual modifications. Like the opentelemetry-java-instrumentation project, this solution automatically modifies code, the difference is that this all happens during the build process. The current implementation reuses the existing instrumentation for Go packages and depends on the package dave/dst to rewrite Go source code. The side effect of this solution is similar to the impact one would expect from manual code modifications:

Increased the final binary size.
Introduced some performance overhead.

Benefits to the OpenTelemetry community

This project significantly lowers the barrier for Go applications to adopt OpenTelemetry. While there is an existing auto-instrumentation solution based on eBPF, it comes with certain limitations. Auto-instrumentation based on code rewriting can achieve the same effect as manual instrumentation in most scenarios and is easier to use in production.

Reasons for New Project

Drawing inspiration from the Java language, users generally prefer non-intrusive solutions (those that don't require manual code modifications). Therefore, we believe that for Go applications, this approach is likely to gain widespread acceptance among users. Making it a project of OpenTelemetry, not only ensures better maintenance but also extends the benefits to a broader user base.

Repository of Our Prototype

https://github.com/alibaba/opentelemetry-go-auto-instrumentation

Existing usage

This project is under development and has some simple demos.

Maintenance

The original contributors to this repository will continue to be involved in the project. Our current roadmap is as follows：

Support all packages listed in open-telemetry/opentelemetry-go-contrib.
Instrument user code according to user-defined configuration.
Instrument Go runtime if it is necessary.

Licenses

Apache License 2.0

Trademarks

No Trademarks

Other notes

No response

edeNFed commented 9 months ago

In addition to the eBPF solution there is also a code generation solution called instrgen: https://github.com/open-telemetry/opentelemetry-go-contrib/tree/main/instrgen

cc @pdelewski

jiekun commented 9 months ago

Before diving deeper, may I kindly ask if this would be similar in implementation to:

Edit: I have great respect for every project and idea, but based on previous observations, their biggest challenge often lies in the lack of a sufficient number of contributors to sustain active development. However, regardless of that, I am extremely grateful for the approach of auto-instrumentation in Go applications and I hope it can be sustained.

pdelewski commented 9 months ago

The idea described by this proposal is generally the same as both projects mentioned above by @jiekun. We should rather focus on what we have so far, build larger community around it and improve. I would be very happy to see more people contributing.

D-D-H commented 9 months ago

Thank you all for the comments.

Sorry, I didn't notice instrgen before. I'll take some time to learn about it and see how we can improve it together.

As @pdelewski said, @jiekun The principle of rewriting code of these projects you mentioned should be similar.

jmacd commented 9 months ago

Thank you @D-D-H. There has been a Go-auto-instrumentation started, with two approaches led by @edeNFed and @pdelewski. I would like to see you join the Go-auto-instrumentation group, which still has a meeting on the OpenTelemetry calendar Tuesdays, 9:30am PST to discuss this proposal. If this group does not have critical mass, we recommend moving this sort of topic into the Go SIG meeting on Thursdays at 10am PST.

D-D-H commented 8 months ago

@jmacd Sorry for the late response and thank you for the invitation. We may not be able to participate in this meeting. How about the next meeting?

pdelewski commented 8 months ago

@D-D-H Feel free to join any time you want.

ralf0131 commented 6 months ago

Hi, @jmacd @pdelewski @edeNFed I am an engineer working on observability in Alibaba Cloud, I have summarized the main difference between our approach and instrgen and the eBPF solution to the best of our knowledge and please correct me if I am wrong.

Comparison with Instrgen

InstrGen leverages Golang's toolexec capability to inject instrumentation code at compile time. With minor changes to the code, users could get auto instrumentation for various libraries. Additionally, it offers crucial support for asynchronous context propagation, a feature highly valued by users. While InstrGen offers several benefits, some aspects may require further evaluation for more user adoption:

Manual Code Modification: Currently, users need to manually add a line of code to their main function. While it seems minor, according to our experience, some users might prefer a solution that doesn't directly modify their codebase.
When users upgrade libraries beyond versions compatible with InstrGen, the injected instrumentation may revert the library to a previously supported version. This behavior deviates from the expected upgrade experience.
Dependency conflict: InstrGen's instrumentation code might introduce dependencies that could conflict with existing user-side dependencies.

Similar to InstrGen, our approach leverages compiler injection to insert instrumentation code. This approach offers several key advantages for users:

No user-side code modifications are required for the instrumentation to work.
Enhanced context propagation: it does not require user to ensure the context is correctly passed, it also support context propagation in asynchronous scenarios like goroutines.
If users have previously added custom spans using the OTel-sdk within their code, our approach integrates these custom spans into the generated trace data.
Enhanced Backward Compatibility: Unlike OTel-go-sdk which only supports the latest two Go releases, our instrumentation offers broader backward compatibility with older versions of instrumented libraries. This eliminates the need for users to manage specific library versions based on OTel-go-sdk limitations.

Comparison with the opentelemetry-go-instrumentaiton (eBPF solution)

The opentelemetry-go-instrumentation project leverages eBPF uprobes for non-intrusive instrumentation of Go applications. Currently supported library includes, net/http, grpc, kafka, SQL and etc. The benefits of this approach are:

Automatic detection of go process and auto instrumentation.
No code modifications are required.
Support wide range of go versions.
Support context propagation.

Actually we have tried the eBPF approach for a while, the considerations that we did not adopted this approach:

Potential Performance Impact: Uprobes can introduce context switching between user and kernel modes, which may have additional performance implications for applications.
Kernel and eBPF Constraints: Compatibility relies on the user's kernel version and due to the limitation of eBPF programs, extending support for additional libraries can be a relatively complicated process.
Limited Context Propagation: Due to limitations in the eBPF instruction set, the maximum number of HTTP headers that can be propagated is currently limited (e.g., 8 for net/http). This may not be sufficient for all production use cases. The solution also require the user to ensure the context is correctly passed. However, in many cases, users often could not guarantee it. This makes the trace context not correctly propagated, especially in the asynchronised scenarios.

Please feel free to comment if you have any questions. We would like to have more discussion with the community in the upcoming SIG meeting. However, the meeting time is not quite friendly with us. I wonder is there any Asia-pacific friendly time for the meeting?

pdelewski commented 6 months ago

@ralf0131 From instrgen perspective as it uses basically the same technology as your approach we can combine them to have best of both. Regarding advantages that you mentioned:

I don't think user source code modification is needed. This was initial approach, but we can eliminate that.
regarding enhanced context propagation, I'm not sure how your approach is different from insrgen one and would like to understand it better. Regarding last two advantages as I said, there is no limitations from instrgen to be on par.

Currently the most important problem we are struggling is to have more people contributing to compile time instrumentation. CC @MrAlias @pellared

ralf0131 commented 6 months ago

@pdelewski Please see my comments in line.

@ralf0131 From instrgen perspective as it uses basically the same technology as your approach we can combine them to have best of both.

I agree that the two approaches share the basic idea however the implementation may varies. I think maybe we can discuss about how to combine them. However due to the difference between the two approaches I am not quite sure how to do that.

Regarding advantages that you mentioned:

I don't think user source code modification is needed. This was initial approach, but we can eliminate that.

That will be good. How would you like to do that?

regarding enhanced context propagation, I'm not sure how your approach is different from insrgen one and would like to understand it better.

Based on my understanding, assuming user has a function called foo(), InstrGen will modified it to foo(ctx Context). Our approach does not modify the foo() function to add the Context as a parameter. Instead, it adds a dedicated field within the goroutine's definition, to store the span context. When a new span is created, the span context will be stored in the goroutine's field. Subsequent retrievals of the span context from the goroutine directly access the previously stored value. To support the asynchronous propagation, during goroutine creation, a child-context can be passed along, enabling context propagation even when user does not pass the context as parameter. We will further consider baggage propagation as well in the future. The idea here is like ThreadLocal in Java.

There is another difference regarding how the instrumented codes are injected. InstrGen analyze code from AutotelEntryPoint and build the call graph, and inject instrumentation code into functions bodies. Our approach employs a rule base policy. It customize with toolexec and analyze all the dependencies. For the dependencies that match the rule, the instrumentation code will be injected.

Regarding last two advantages as I said, there is no limitations from instrgen to be on par.

Could you elaborate more on how InstrGen will do that?

Currently the most important problem we are struggling is to have more people contributing to compile time instrumentation. CC @MrAlias @pellared

I agree with that. :)

pdelewski commented 6 months ago

@ralf0131 Please see my comment below.

Based on my understanding, assuming user has a function called foo(), InstrGen will modified it to foo(ctx Context). Our approach does not modify the foo() function to add the Context as a parameter. Instead, it adds a dedicated field within the goroutine's definition, to store the span context. When a new span is created, the span context will be stored in the goroutine's field. Subsequent retrievals of the span context from the goroutine directly access the previously stored value. To support the asynchronous propagation, during goroutine creation, a child-context can be passed along, enabling context propagation even when user does not pass the context as parameter. We will further consider baggage propagation as well in the future. The idea here is like ThreadLocal in Java.

There is another difference regarding how the instrumented codes are injected. InstrGen analyze code from AutotelEntryPoint and build the call graph, and inject instrumentation code into functions bodies. Our approach employs a rule base policy. It customize with toolexec and analyze all the dependencies. For the dependencies that match the rule, the instrumentation code will be injected.

Could you elaborate more on how InstrGen will do that?

Your understanding is based on what we have so far on the main branch, however there is PR https://github.com/open-telemetry/opentelemetry-go-contrib/pull/4058 (opened almost year ago) which is more or less what you described above, so both tools follows the same techniques. Of course there still might be some implementation differences, but they are rather small from high level view.

ralf0131 commented 6 months ago

Could you elaborate more on how InstrGen will do that?

Your understanding is based on what we have so far on the main branch, however there is PR open-telemetry/opentelemetry-go-contrib#4058 (opened almost year ago) which is more or less what you described above, so both tools follows the same techniques. Of course there still might be some implementation differences, but they are rather small from high level view.

Hi @pdelewski ,

You're right. My apologies, I hadn't noticed your pull request before and my understanding are based solely on the main branch.

I've taken a look at your implementation, and I agree that from the high level view they share the same idea. We're actually planning to open-source the latest version of our code sometime in June. This would allow us for a more in-depth discussion about combining the two approaches once we can see each other's work in detail. What do you think?

pdelewski commented 6 months ago

Hi @ralf0131 ,

That's great idea. Also from timeframe perspective, June sound good to me.

jiekun commented 6 months ago

I saw a great idea and comments from maintainers here. As we know, providing auto-instrumentation for Go programs is difficult, but that's not the reason which cause the previous implementation being less active.

I've talked to some developers before, and they haven't heard about instrgen. I assume they will also be not aware of the potential implementation discussed here. So, there are a couple things we need to consider:

How can we incubate this (both instrgen and potentially a new implementation)? How can we increase attract more users? (External talks? Blog post?)
How can we advocate for more developers and maintainers to join these projects?
...

It would be great to see new projects/impls, and please also consider how we could keep them active for a long period of time.

ralf0131 commented 6 months ago

I've talked to some developers before, and they haven't heard about instrgen. I assume they will also be not aware of the potential implementation discussed here. So, there are a couple things we need to consider:

I saw a great idea and comments from maintainers here. As we know, providing auto-instrumentation for Go programs is difficult, but that's not the reason which cause the previous implementation being less active.

I've talked to some developers before, and they haven't heard about instrgen. I assume they will also be not aware of the potential implementation discussed here. So, there are a couple things we need to consider:

How can we incubate this (both instrgen and potentially a new implementation)? How can we increase attract more users? (External talks? Blog post?)

That's right, I think we will try to add some documentation and articles to introduce the compile time instrumentation. We also submitted proposal to the KubeCon China 2024 and hopefully we can be there to present :).

pdelewski commented 6 months ago

Let me share my thoughts.

It seems that most people focus on ebpf right now, no matter that it's just harder from development perspective and has tradeoffs described above. Having said that, it has one advantage that might be important for some group of people, e.g no need for recompilation or access to source code.

Another thing is that it's unfortunate that instrgen landed in opentelemetry-go-contrib instead of opentelemetry-go-instrumentation so people interested in go instrumentation usually go there and don't know about instrgen existence.

I also tried to advertise it during KubeCon 2023 in Chicago, however that's not enough. Blogposts, articles and more people involved might help change the situation.

ralf0131 commented 6 months ago

Let me share my thoughts.

It seems that most people focus on ebpf right now, no matter that it's just harder from development perspective and has tradeoffs described above. Having said that, it has one advantage that might be important for some group of people, e.g no need for recompilation or access to source code.

Another thing is that it's unfortunate that instrgen landed in opentelemetry-go-contrib instead of opentelemetry-go-instrumentation so people interested in go instrumentation usually go there and don't know about instrgen existence.

Just be curious, may I ask why instrgen was landed in opentelemetry-go-contrib? Is it possible to move it to somewhere under opentelemetry-go-instrumentation ? I do think this a key reason people might be difficult to know about it. At least in the documentation, I did not find any official reference to instgen.

I also tried to advertise it during KubeCon 2023 in Chicago, however that's not enough. Blogposts, articles and more people involved might help change the situation.

edeNFed commented 6 months ago

I don't think moving instrgen to opentelemetry-go-instrumentation is a good idea. From a user perspective, I expect all the repositories named opentelemetry-XXX-instrumentation will be an agent-like thing that does instrumentation in runtime (this is correct for Java, Python, .NET, JS). Compile time instrumentation falls under the tool category and should therefore be in a different repository.

ralf0131 commented 6 months ago

I don't think moving instrgen to opentelemetry-go-instrumentation is a good idea. From a user perspective, I expect all the repositories named opentelemetry-XXX-instrumentation will be an agent-like thing that does instrumentation in runtime (this is correct for Java, Python, .NET, JS). Compile time instrumentation falls under the tool category and should therefore be in a different repository.

@edeNFed Thanks for the clarification. Perhaps we should at least let users know, in the documentation, there are two approaches of instrumentation, one is the eBPF based auto instrumentation and the other is the compile time instrumentation? My suggestion will be:

Go
\--GettingStarted
\--Instrumentation
\--Automatic
    \-- runtime instrumentation (eBPF based)
    \-- compile time instrumentation

Looking at the documentation of Java, there is also a Spring Boot page under automatic folder, which is not a fully runtime instrumentation. To my understanding, if a Java application is compiled into a native image with GraalVM, this process is essentially a compile time instrumentation.

My second question for the eBPF-based instrumentation is, eBPF could offer a approach to instrumentation that beyond any specific programming language, why it has not been applied to other languages besides Go?

trask commented 6 months ago

From a user perspective, I expect all the repositories named opentelemetry-XXX-instrumentation will be an agent-like thing that does instrumentation in runtime (this is correct for Java, Python, .NET, JS).

just fyi opentelemetry-java-instrumentation includes both agent (runtime instrumentation) and library (build-time instrumentation)

jiekun commented 6 months ago

I expect all the repositories named opentelemetry-XXX-instrumentation will be an agent-like thing that does instrumentation in runtime (this is correct for Java, Python, .NET, JS).

From my perspective as a user, I don't think so.

The opentelemetry-xxx-instrumentation repository should provide automated instrumentation solutions, whether it's eBPF, Java agent, compile-time instrumentation, or something else. It different from manual instrumentation, which should be placed under the opentelemetry-xxx(-contrib) repository.

Developers do not need to care about how Java's auto-instrumentation works. All they need to know is that they don't need to modify codes.

In short, I expect the solutions provided by the opentelemetry-xxx-instrumentation repository to not require code changes, or at most, only minor code modifications.

edeNFed commented 6 months ago

The only thing shared between the eBPF instrumentation and the compiled time instrumentation is that they are both targeting Go applications.

Everything else is different, the programming language the instrumentation are written in, the contributors working on the project, tests, CI, etc.

I agree with @ralf0131 if the goal is to get more visibility into compile time instrumentation, making it more visible in the documentation is preferred in my opinion instead of mixing two unrelated projects into single repository.

trask commented 6 months ago

The opentelemetry-xxx-instrumentation repository should provide automated instrumentation solutions, whether it's eBPF, Java agent, compile-time instrumentation, or something else. It different from manual instrumentation, which should be placed under the opentelemetry-xxx(-contrib) repository.

(sort of a side conversation, but just wanted to mention that it's not quite this clear of a distinction for the Java repos at least, where the primary differentiator is that opentelemetry-java-instrumentation modules are maintained by the repo maintainers while opentelemetry-java-contrib is a distributed ownership model where individual components are maintained by component owners)

ralf0131 commented 4 months ago

Hi all,

Update: recently we have just released our first version of compile time instrumentation on the commercial side. Now we are working on the open source our solution. It is a bit late as expected but we are working on it.

RomainMuller commented 4 months ago

I'd like to note that @DataDog is up to extremely similar work in https://github.com/DataDog/orchestrion. This currently injects instrumentation for @DataDog's "properietary" SDK (https://github.com/DataDog/dd-trace-go)...

That said, the code rewriting is configuration-driven, and configuration targeting the OTel SDK instead could absolutely be made. We (at @DataDog) would most certainly welcome contributions in this direction.

ralf0131 commented 3 months ago

HI All,

Update: we have almost finished the initial version the compile time instrumentation approach. We are heading towards the first release. We are planning to introduce the approach in the upcoming tag-observability meeting (Tuesday, Aug 13 · 18:00 – 19:00 (Time zone: America/Los_Angeles), Wednesday, Aug 14 · 9:00 – 10:00 (Time zone: UTC+8)). Anyone who is interested please join to discuss. The meeting information can be found here.

danielgblanco commented 2 months ago

Hi @ralf0131 any feedback from the tag-observability meeting you can share here?

ralf0131 commented 2 months ago

@danielgblanco Not yet. Due to some technical issues, the meeting was not held as expected :(. We are working on a new meeting to discuss with it. We really need help on how to reach out to more people who is interested in the community. Any suggestions will be appreciated.

Hi @ralf0131 any feedback from the tag-observability meeting you can share here?

github-actions[bot] commented 2 months ago

This has been automatically marked as stale because it has been marked as needing author feedback and has not had any activity for 7 days. It will be closed if no further activity occurs within 7 days of this comment.

ralf0131 commented 2 months ago

@pdelewski Good day! Now that the source code of our approach has been available and we have published our first release. You can check it out and take a look at the source code, I think we can have a discussion on how to combine the our approach and InstrGen project. What do you think?

pdelewski commented 2 months ago

@pdelewski Good day! Now that the source code of our approach has been available and we have published our first release. You can check it out and take a look at the source code, I think we can have a discussion on how to combine the our approach and InstrGen project. What do you think?

@ralf0131 That's good plan!. I will look at your work and then we can discuss further steps.

ralf0131 commented 2 months ago

@pdelewski In the last APAC tag-observability meeting I have introduced the compile time instrumentation, the recording can be found here.

damemi commented 2 months ago

I'm a little late to this discussion, but I agree that compile-time instrumentation doesn't belong under opentelemetry-go-instrumentation just because they are clearly distinct projects. There's no reason for them to share a CI pipeline, for example, because they have very different mechanisms to test. I've never been a fan of how general the otel-go-instrumentation repo name is (I would prefer it to specify ebpf, auto, or runtime), but that's where we are.

I think combining this project and instrgen is better than having a third Go auto-instrumentation option. In that case it is big enough to have its own repo imo. Then to it presents more clearly to users that your options for Go are runtime or build time instrumentation. It also doesn't make either option appear "preferred", they both have their tradeoffs.

pdelewski commented 2 months ago

@damemi I agree that the best option is to combine both compile-time instrumentation projects (in other words, to have just one), as they are based on the same approach and achieve more or less the same outcome. From my perspective, having a dedicated repository for this project would be ideal compared to including it in contrib.

One of the problems I've observed, which is partly related to contrib and partly due to the lack of contributors, is the slow process of reviewing changes and deciding whether they can be merged. Auto-instrumentation is a specialized topic, and the people dedicated to pushing this project forward should be the ones deciding what is beneficial and what is not. Additionally, they should have the permissions to approve and merge changes.

jwafle commented 2 months ago

Just wanted to add a note as a practitioner of OpenTelemetry for a large company for nearly a year now, I had no knowledge of instgen even though I wrote the wrapper library for the opentelemetry-go-sdk for our company because it isn't documented anywhere in the Go SDK/instrumentation docs.

As an end-user, I go to the documentation first and foremost while working. For any project, I expect that any features that the contributors/maintainers want me to know about will be at least referenced in the documentation. I strongly believe that instgen or any subsequent offering will be lacking contributors until it is at least referenced in the documentation (even if it is just a beta offering, it is still worth documenting).

I would be happy to work on opening a PR to add instgen to the docs if there is consensus.

trask commented 2 months ago

I would be happy to work on opening a PR to add instgen to the docs if there is consensus.

cc @open-telemetry/docs-approvers

pdelewski commented 2 months ago

Hi @ralf0131,

I watched your presentation and really liked it! I'm currently reviewing some implementation details and wanted to clarify one point of misunderstanding.

In the presentation, it was mentioned that instrgen requires manual context propagation by adding an additional parameter. In the first version of the tool (which is still on the main branch), context was indeed propagated via an additional ctx parameter. However, this parameter was injected automatically.

That said, I have rewritten much of the tool in this pull request, leveraging Golang runtime injection. As a result, the behavior now seems to align more closely with your project. Additionally, I introduced a plugin architecture in this PR, so both projects share similarities in core architecture.

I plan to complete a deeper review of the code over the next few days, and I’m considering whether we could potentially combine the two approaches or explore a replacement (there are also some features unique for instrgen).

Lastly, due to some challenges with contributions, your project currently appears to be more robust in terms of available plugins.

Looking forward to your thoughts!

ralf0131 commented 2 months ago

Hi @ralf0131,

I watched your presentation and really liked it! I'm currently reviewing some implementation details and wanted to clarify one point of misunderstanding.

In the presentation, it was mentioned that instrgen requires manual context propagation by adding an additional parameter. In the first version of the tool (which is still on the main branch), context was indeed propagated via an additional ctx parameter. However, this parameter was injected automatically.

That said, I have rewritten much of the tool in this pull request, leveraging Golang runtime injection. As a result, the behavior now seems to align more closely with your project. Additionally, I introduced a plugin architecture in this PR, so both projects share similarities in core architecture.

I plan to complete a deeper review of the code over the next few days, and I’m considering whether we could potentially combine the two approaches or explore a replacement (there are also some features unique for instrgen).

Lastly, due to some challenges with contributions, your project currently appears to be more robust in terms of available plugins.

Looking forward to your thoughts!

Hi @pdelewski Glad to hear from you! Thanks for the clarification. Yes the investigation was based on the main branch so there might be lacking of understanding on the latest pull request. Would you like to summarize the unique feature of Instrgen, and lets schedule a meeting to discuss how to combine the two approaches. What do you think?

svrnm commented 2 months ago

I would be happy to work on opening a PR to add instgen to the docs if there is consensus.

cc @open-telemetry/docs-approvers

@jwafle can you raise an issue over at https://github.com/open-telemetry/opentelemetry.io for this to discuss? Thanks!

tedsuo commented 2 months ago

fyi I responded here https://github.com/open-telemetry/community/issues/2344#issuecomment-2377753834 in regard to how to proceed with this donation.

github-actions[bot] commented 1 month ago

This has been automatically marked as stale because it has been marked as needing author feedback and has not had any activity for 7 days. It will be closed if no further activity occurs within 7 days of this comment.

XSAM commented 1 month ago

This auto instrumentation approach looks good, though I have a few concerns:

How users are going to debug their applications if injected codes have bugs?
I assume the application after this auto instrumentation would print different stack trace results compared to the users' code base; thus, it would be very difficult to track the correct code lines, which makes debugging harder. I wonder if there is a solution to this.

y1yang0 commented 1 month ago

Hi @XSAM,

How users are going to debug their applications if injected codes have bugs?

We prepared a document for this, please refer to https://github.com/alibaba/opentelemetry-go-auto-instrumentation/blob/main/docs/how-to-debug.md for more details.

I assume the application after this auto instrumentation would print different stack trace results compared to the users' code base; thus, it would be very difficult to track the correct code lines, which makes debugging harder. I wonder if there is a solution to this.

We have ensured that all the inserted code is on a single line and have retained the code file after the insertion. In practice, troubleshooting the issue is not difficult.

XSAM commented 1 month ago

Hi @y1yang0, thanks for the explanation! It is good to have a way to know the modified code.

We have ensured that all the inserted code is on a single line and have retained the code file after the insertion. In practice, troubleshooting the issue is not difficult.

Even a single line number offset can make a difference, not to mention a Go file can contain multiple methods, which means multiple-line number offset is possible. The offset is not a constant number and can be scaled based on the users' code style.

Furthermore, I tried the https://github.com/alibaba/opentelemetry-go-auto-instrumentation/blob/main/example/log/main.go example. Even if it does not insert any code inside methods, it will replace the import path to include OTel go dependencies and generated rules. This creates the line number offset to 8.

This basically means users need to keep the -debug output always available when it comes to debugging, whether it is debugging their own logic or this auto instrumentation logic.

y1yang0 commented 1 month ago

Even a single line number offset can make a difference, not to mention a Go file can contain multiple methods

Yes, so we retained the code file after the insertion, users can refer to it to know exactly what line 123 is.

This basically means users need to keep the -debug output always available when it comes to debugging, whether it is debugging their own logic or this auto instrumentation logic.

Even if we don't use -debug, we will keep the modified files under .otel-build directory.

XSAM commented 1 month ago

Yes, so we retained the code file after the insertion, users can refer to it to know exactly what line 123 is. Even if we don't use -debug, we will keep the modified files under .otel-build directory.

My point is the line number reported by Go runtime is not the same as the source code as long as the -debug flag is not given.

Take this source code as an example, it does not change after running otelbuild. And, I don't see a modified version under the .otel-build folder.

// Copyright (c) 2024 Alibaba Group Holding Ltd.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
//      http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.

package main

import (
    "go.opentelemetry.io/otel/sdk/trace"
    "go.uber.org/zap"
    "net/http"
)

func main() {
    http.HandleFunc("/log", func(w http.ResponseWriter, r *http.Request) {
        logger := zap.NewExample()
        logger.Debug("this is debug message")
        logger.Info("this is info message")
        logger.Info("this is info message with fileds",
            zap.Int("age", 37),
            zap.String("agender", "man"),
        )
        logger.Warn("this is warn message")
        logger.Error("this is error message")
    })

    http.HandleFunc("/logwithtrace", func(w http.ResponseWriter, r *http.Request) {
        logger := zap.NewExample()
        // GetTraceAndSpanId will be added while using otelbuild, users must use otelbuild to build the module
        traceId, spanId := trace.GetTraceAndSpanId()
        logger.Info("this is info message with fileds",
            zap.String("traceId", traceId),
            zap.String("spanId", spanId),
            zap.Stack("stack"),
        )
    })
    http.ListenAndServe(":9999", nil)
}

If I run curl localhost:9999/logwithtrace to trigger this is info message with fileds log. I got

{"level":"info","msg":"this is info message with fileds","traceId":"1939efcd7cb63a02d1df8f74b06cd76e","spanId":"0b05beee3bc946ea","stack":"main.main.func2\n\t/Users/samxie/codes/git/github/alibaba/opentelemetry-go-auto-instrumentation/example/log/main.go:51\nnet/http.HandlerFunc.ServeHTTP\n\tserver.go:2166\nnet/http.(*ServeMux).ServeHTTP\n\tserver.go:2683\nnet/http.serverHandler.ServeHTTP\n\tserver.go:3138\nnet/http.(*conn).serve\n\tserver.go:2039"}

It refers to line number 51, which does not exist in the source code. The total line of the source code is 47.

And, this is the modified source code after using the -debug flag,

// Copyright (c) 2024 Alibaba Group Holding Ltd.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
//      http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.

package main

import _ "test/otel_rules"

import _ "go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracehttp"
import _ "go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc"
import _ "go.opentelemetry.io/otel/exporters/otlp/otlptrace"
import _ "go.opentelemetry.io/otel/sdk"
import _ "go.opentelemetry.io/otel"

import (
    "go.opentelemetry.io/otel/sdk/trace"
    "go.uber.org/zap"
    "net/http"
)

func main() {
    http.HandleFunc("/log", func(w http.ResponseWriter, r *http.Request) {
        logger := zap.NewExample()
        logger.Debug("this is debug message")
        logger.Info("this is info message")
        logger.Info("this is info message with fileds",
            zap.Int("age", 37),
            zap.String("agender", "man"),
        )
        logger.Warn("this is warn message")
        logger.Error("this is error message")
    })

    http.HandleFunc("/logwithtrace", func(w http.ResponseWriter, r *http.Request) {
        logger := zap.NewExample()
        // GetTraceAndSpanId will be added while using otelbuild, users must use otelbuild to build the module
        traceId, spanId := trace.GetTraceAndSpanId()
        logger.Info("this is info message with fileds",
            zap.String("traceId", traceId),
            zap.String("spanId", spanId),
            zap.Stack("stack"),
        )
    })
    http.ListenAndServe(":9999", nil)
}

This reflects the correct code number that the log is referring to, as the line 51 is zap.Stack("stack"),.

Summary

Users have to keep the -debug flag output available to determine which code line the stack trace refers to.

y1yang0 commented 1 month ago

And, I don't see a modified version under the .otel-build folder.

Sorry, this is a bug. We only retained the modified files of the third-party library, but we didn't retain the modified files in the current project. I will fix this issue soon.

pdelewski commented 1 month ago

@ralf0131 Sorry for the delay. I did a deeper dive into the code, and as mentioned above, there are some differences in the details. However, both projects are generally based on the same ideas. Since your project now seems to be superior (has more supported libraries), replacing the instrgen source code could be a valid option. One feature that instrgen also had was building a static call graph and injecting it into selected user functions via a simple UI (something we can discuss later). For now, there are a few open questions:

Should this be part of opentelemetry-contrib or its own repository? The answer to this question will also determine some of the further development decisions and processes. I can bring this up in the next opentelemetry-go-instrumentation SIG. Regarding a meeting, what time works best for you? I will check if I can make it.

ralf0131 commented 1 month ago

Hi @pdelewski ,

@ralf0131 Sorry for the delay. I did a deeper dive into the code, and as mentioned above, there are some differences in the details. However, both projects are generally based on the same ideas. Since your project now seems to be superior (has more supported libraries), replacing the instrgen source code could be a valid option. One feature that instrgen also had was building a static call graph and injecting it into selected user functions via a simple UI (something we can discuss later).

Sure, let's discuss it in the meeting.

For now, there are a few open questions:

Should this be part of opentelemetry-contrib or its own repository? The answer to this question will also determine some of the further development decisions and processes. I can bring this up in the next opentelemetry-go-instrumentation SIG.

I would suggest a dedicated repository instead of under the opentelemetry-contrib.

Regarding a meeting, what time works best for you? I will check if I can make it.

There is already a discussion in the opentelemetry-go-instrumentation slack channel (starting by the Go-maintainers), could you please check it out? I am not sure about your timezone, what about 5pm/6pm PT on Monday (which is 8am/9am in China) October 14th, we can meet and discuss with Go SIG maintainers.

pdelewski commented 1 month ago

There is already a discussion in the opentelemetry-go-instrumentation slack channel (starting by the Go-maintainers), could you please check it out? I am not sure about your timezone, what about 5pm/6pm PT on Monday (which is 8am/9am in China) October 14th, we can meet and discuss with Go SIG maintainers.

I'll check the discussion (I haven't been there for a while). Monday should work for me. BTW. I'm based in Europe (Poland). PT - pacific time? Seems to be 2am at my side. After consideration, that might be hard for me.

open-telemetry / community