open-telemetry / opentelemetry-java-instrumentation

OpenTelemetry auto-instrumentation and instrumentation libraries for Java
https://opentelemetry.io
Apache License 2.0
1.91k stars 832 forks source link

Prevent duplicate telemetry when using both library and auto instrumentation #903

Open trask opened 4 years ago

trask commented 4 years ago

I think this is an important, e.g. you may start out using library instrumentation, and then decide to throw on the javaagent later (or your ops team decides to throw on the javaagent), and I think part of the goal of maintaining both library and auto instrumentation in the same repo is so we can make them play nicely together.

(this includes preventing duplicate @WithSpan telemetry when using both spring-aop instrumentation module and javaagent)

anuraaga commented 4 years ago

For reference, I guess this is along the lines of the gRPC logic to prevent the interceptor from being added if it already exists

https://github.com/open-telemetry/opentelemetry-java-instrumentation/blob/master/instrumentation/grpc-1.5/src/main/java/io/opentelemetry/auto/instrumentation/grpc/server/GrpcServerBuilderInstrumentation.java#L71

which interestingly seems extremely unlikely until we extract the library instrumentation :D

anuraaga commented 4 years ago

Also I guess we can have a pattern of including a duplicate test in the auto project which configures the same way the library test does, and it should still pass (currently Armeria wouldn't I think, heh)

https://github.com/open-telemetry/opentelemetry-java-instrumentation/blob/master/instrumentation/armeria-1.0/auto/src/test/groovy/io/opentelemetry/auto/instrumentation/armeria/v1_0/ArmeriaServerTest.groovy#L28

trask commented 4 years ago

If possible, I think ideally the auto-instrumentation would take precedence over the library instrumentation (where they are both official packages from this repo). The reason for this is that it's generally easier to upgrade to the latest javaagent (e.g. done by ops to fix a bug or get a new feature) than to upgrade to the latest library instrumentation.

[EDIT: just adding thoughts here]

trask commented 4 years ago

[adding more thoughts here]

see https://github.com/open-telemetry/opentelemetry-java-instrumentation/pull/920#discussion_r467361890:

another option for suppressing library instrumentation that might be easier to generalize, is to make the coordination more explicit, e.g. have a method in instrumentation-api like isAutoEnabled(instrumentationName), and the library instrumentation can explicitly check that to see if there is auto instrumentation with the same name enabled

anuraaga commented 4 years ago

That's a nice idea. I'm a bit on the fence about introducing logic in library instrumentation to support auto - it's harmless but does seem like a break in separations of concerns.

But it gives me the idea that I didn't have at all, instead of instrumenting the library to suppress, I could instrument the library instrumentation itself. So stub out newdecorator to return a no op, using byte buddy. It still has the problems of shading the constant and the tediousness of replacing a method in byte buddy so not sure if it's worth it just for the concern of separation of concerns.

trask commented 4 years ago

I'm a bit on the fence about introducing logic in library instrumentation to support auto

ya i understand. the way i view is that it's also a big selling point of this project, that we are developing auto and library instrumentation in concert, so that they (1) produce the same telemetry and (2) play nice together. so, not so strange that we would produce "auto-compatible" library instrumentation

mateuszrzeszutek commented 3 years ago

If choosing library instrumentation over javaagent instrumentation is an option, we could add a classloader matcher to javaagent instrumentations that would prevent them from being applied if the library instrumentation is already present on the classpath (preferably in a separate method, e.g. libraryInstrumentationMatcher()). The downside of this is that library instrumentation will always win. On another hand, calling sth like isAutoEnabled(instrumentationName) may not be really feasible in library instrumentation that's applied using SPI (like log4j or AWS SDK), but preventing the javaagent instrumentation in this case is trivial.

trask commented 3 years ago

We discussed in SIG meeting today and agreed to have the javaagent instrumentation back off, and have the library instrumentation take precedence, because

mateuszrzeszutek commented 3 years ago

Turning off instrumentation just based on the library class presence is not a good solution for most cases; reopening.

We'll probably have to stick with shading library instrumentation classes and have javaagent advices detect them manually; I don't have any idea for a common solution to this problem.

lmolkova commented 3 years ago

I'm working on Azure SDK instrumentation and have a problem with double instrumentation on the HTTP layer. Here's the scenario:

So what we have now is:

Suppressing instrumentation in the way #2661 was done, doesn't help - we want HTTP to be suppressed. Suppressing all HTTP (netty) instrumentation either - users still use netty for their calls.

Solutions:

  1. Final instrumentation. One idea we've previously entertained with OpenCensus community is terminal span - I have somewhat outdated proposal - as a way to coordinate instrumentations

  2. Backoff if the context is injected. Another idea that may be a short-term mitigation step: we'll have more issues like that with any protocol. If the context is already injected, there is nothing good this layer of instrumentation can do - everything would be wrong, just back-off

Since this is not a language-specific issue, I'll create an issue in spec repo.

lmolkova commented 3 years ago

One more thing that probably won't work is #1822. Suppressing nested CLIENT spans is some hidden contract that would be very hard to explain to users and a huge support burden - users would manually instrument and would not see their spans.