Prevent duplicate telemetry when using both library and auto instrumentation

trask commented 4 years ago

I think this is an important, e.g. you may start out using library instrumentation, and then decide to throw on the javaagent later (or your ops team decides to throw on the javaagent), and I think part of the goal of maintaining both library and auto instrumentation in the same repo is so we can make them play nicely together.

(this includes preventing duplicate @WithSpan telemetry when using both spring-aop instrumentation module and javaagent)

anuraaga commented 4 years ago

For reference, I guess this is along the lines of the gRPC logic to prevent the interceptor from being added if it already exists

https://github.com/open-telemetry/opentelemetry-java-instrumentation/blob/master/instrumentation/grpc-1.5/src/main/java/io/opentelemetry/auto/instrumentation/grpc/server/GrpcServerBuilderInstrumentation.java#L71

which interestingly seems extremely unlikely until we extract the library instrumentation :D

anuraaga commented 4 years ago

Also I guess we can have a pattern of including a duplicate test in the auto project which configures the same way the library test does, and it should still pass (currently Armeria wouldn't I think, heh)

https://github.com/open-telemetry/opentelemetry-java-instrumentation/blob/master/instrumentation/armeria-1.0/auto/src/test/groovy/io/opentelemetry/auto/instrumentation/armeria/v1_0/ArmeriaServerTest.groovy#L28

trask commented 4 years ago

If possible, I think ideally the auto-instrumentation would take precedence over the library instrumentation (where they are both official packages from this repo). The reason for this is that it's generally easier to upgrade to the latest javaagent (e.g. done by ops to fix a bug or get a new feature) than to upgrade to the latest library instrumentation.

[EDIT: just adding thoughts here]

trask commented 4 years ago

[adding more thoughts here]

see https://github.com/open-telemetry/opentelemetry-java-instrumentation/pull/920#discussion_r467361890:

another option for suppressing library instrumentation that might be easier to generalize, is to make the coordination more explicit, e.g. have a method in instrumentation-api like isAutoEnabled(instrumentationName), and the library instrumentation can explicitly check that to see if there is auto instrumentation with the same name enabled

anuraaga commented 4 years ago

That's a nice idea. I'm a bit on the fence about introducing logic in library instrumentation to support auto - it's harmless but does seem like a break in separations of concerns.

But it gives me the idea that I didn't have at all, instead of instrumenting the library to suppress, I could instrument the library instrumentation itself. So stub out newdecorator to return a no op, using byte buddy. It still has the problems of shading the constant and the tediousness of replacing a method in byte buddy so not sure if it's worth it just for the concern of separation of concerns.

trask commented 4 years ago

I'm a bit on the fence about introducing logic in library instrumentation to support auto

ya i understand. the way i view is that it's also a big selling point of this project, that we are developing auto and library instrumentation in concert, so that they (1) produce the same telemetry and (2) play nice together. so, not so strange that we would produce "auto-compatible" library instrumentation

mateuszrzeszutek commented 3 years ago

If choosing library instrumentation over javaagent instrumentation is an option, we could add a classloader matcher to javaagent instrumentations that would prevent them from being applied if the library instrumentation is already present on the classpath (preferably in a separate method, e.g. libraryInstrumentationMatcher()). The downside of this is that library instrumentation will always win. On another hand, calling sth like isAutoEnabled(instrumentationName) may not be really feasible in library instrumentation that's applied using SPI (like log4j or AWS SDK), but preventing the javaagent instrumentation in this case is trivial.

trask commented 3 years ago

We discussed in SIG meeting today and agreed to have the javaagent instrumentation back off, and have the library instrumentation take precedence, because

In the future when some libraries (hopefully) start emitting OTel directly, we will want to do the same for that instrumentation (have javaagent back off)
~It's easy to implement~
The user may have configured the library instrumentation (e.g. set certain options in builder), and it's nice to respect those settings
If user really wants javaagent to take precedence, they have a reasonable workaround to remove the library instrumentation

mateuszrzeszutek commented 3 years ago

Turning off instrumentation just based on the library class presence is not a good solution for most cases; reopening.

We'll probably have to stick with shading library instrumentation classes and have javaagent advices detect them manually; I don't have any idea for a common solution to this problem.

lmolkova commented 3 years ago

I'm working on Azure SDK instrumentation and have a problem with double instrumentation on the HTTP layer. Here's the scenario:

we want to support users who auto-instrument and those who don't - it's not in our control (so we instrument HTTP for them)
we instrument public logical API calls like uploading a blob (so users know what happens on the public surface of the library and can map spans to their code). Public API calls can translate into a number of HTTP calls (retries, auth, complex operations).
we want to provide additional details on top of HTTP client span that auto-instrumentation would not include: they are specific to Azure and are public Azure service properties users need to get support. Additionally, we occasionally want to strip some sensitive parameters. They are per-http request and have no place on parent logical span

So what we have now is:

[manual/agent] SERVER span
- [with/without agent] uploadBlob span (INTERNAL)
- [with/without agent] HTTP spans (CLIENT) with extra stuff specific to Azure
- [agent] HTTP spans (parent depends on how we create HTTP span in the client lib, so it's flexible)

Suppressing instrumentation in the way #2661 was done, doesn't help - we want HTTP to be suppressed. Suppressing all HTTP (netty) instrumentation either - users still use netty for their calls.

Solutions:

Final instrumentation. One idea we've previously entertained with OpenCensus community is terminal span - I have ~~somewhat outdated~~ proposal - as a way to coordinate instrumentations
Backoff if the context is injected. Another idea that may be a short-term mitigation step: we'll have more issues like that with any protocol. If the context is already injected, there is nothing good this layer of instrumentation can do - everything would be wrong, just back-off

Since this is not a language-specific issue, I'll create an issue in spec repo.

lmolkova commented 3 years ago

One more thing that probably won't work is #1822. Suppressing nested CLIENT spans is some hidden contract that would be very hard to explain to users and a huge support burden - users would manually instrument and would not see their spans.

open-telemetry / opentelemetry-java-instrumentation

Prevent duplicate telemetry when using both library and auto instrumentation #903

Solutions: