hypertrace / javaagent

Hypertrace OpenTelemetry Java agent with payload/body and headers data capture.
Apache License 2.0
30 stars 15 forks source link

Upgrade OTEL to 1.3.1 #341

Closed ryandens closed 2 years ago

ryandens commented 3 years ago

Description

Upgrades OTEL to 1.3.1 Notable changes:

Gradle changes

We made a few changes to how this project is setup and configured. Notably, we're doing our best to minimize the usage of subprojects and allprojects directives so that we can incrementally migrate projects to new patterns established in pentelemetry-java-instrumentation/examples/distro. We updated the ByteBuddyPluginConfigurator to avoid resolving project tasks and to better align with the otel copy.

Consuming OpenTelemetry instrumentation artifacts

OpenTelemetry has been migrating the individual instrumentation modules to a new pattern, publishing JARs of each instrumentation module with shaded dependencies and relocated packages. The message is clear: these packages aren't meant for consumption and referencing by downstream distributions. Unfortunately, we currently consume them. Where applicable, we prefer to use the "standalone instrumentation libraries" that are published and seem intended for consumption. We'll be working to advocate for the moving of certain utilities to the "standalone instrumentation libraries" for easier consumption. In the meantime, when we need to reference an OTEL Tracer that is only published in an instrumentation library, we must create a project that un-shades the dependency. Then, our custom instrumentation libraries can have a compile-time dependency on the unshaded artifact. Our instrumentation library then has its references to the shaded otel classes relocated by the task :instrumentation:shadowJar. We demonstrate that this process functions properly with our smoke tests.

Testing with OpenTelemetry instrumentation artifacts

Some of our modules don't reference anything in the OpenTelemetry instrumentation modules in their production code, but do depend on those modules being there in order to enhance spans with our own metadata. The way our tests are setup today, we expect to have the unshaded instrumentation libraries on the classpath. Therefore, these projects must also have a testRuntimeOnly dependency on the unshaded instrumentation components. In order to eliminate this complexity, we'll update our testing strategy to match that of the example OpenTelemetry distribution which uses a special agent for testing purposes to instrument test classes.

ApacheAsyncClientInstrumentationModule incompatibility

When we upgraded to OTEL v1.3.1. our ApacheAsyncClientInstrumentationModule started having its test fail badly. There we some significant changes made to the underlying OTEL module in PR #3209. Notably, we no longer had an easy way to access the client Context as a downstream instrumentation library because the creation of that context was delayed until the request is generated. We put in a hack here, but we should ideally work with the upstream maintainers to re-expose the client context in some way.

In addition, we had to carefully modify our instrumentation. Our DelegatingCaptureBodyRequestProducer must get the Context off the BodyCaptureDelegatingCallback's delegate private field context after DelegatingCaptureBodyRequestProducer.super.generateRequest() has been invoked (which causes this field to get assigned).

Servlet Async incompatibility

When we upgraded OTEL to v1.3.11 our async body capturing tests for servlets started failing. This was a result of an upstream change in this PR. Notably, the async spans are marked as ended much earlier than before, so new attributes that hypertrace attempted to add to the span were never added. To fix this, we simply moved our async instrumentation to be more like OTEL's so we have the chance to add the attributes before the span is ended.

Testing-common changes

We added a hack to the TestOpenTelemetryInstaller which initializes a field with the application class loader. This hack has to exist until we switch over to the better testing pattern set by OTEL that leverages the special agent for testing purposes.

ryandens commented 3 years ago

Note, we had to add the grpc-netty-shaded runtime dependency to smoke-tests here because it was removed from the OTEL javaagent-tooling project in this PR

pavolloffay commented 3 years ago

OpenTelemetry has been migrating the individual instrumentation modules to a new pattern, publishing JARs of each instrumentation module with shaded dependencies and relocated packages. The message is clear: these packages aren't meant for consumption and referencing by downstream distributions.

Why are they published then? There must be a reason. Here the example is consuming servlet artifacts https://github.com/open-telemetry/opentelemetry-java-instrumentation/blob/main/examples/distro/instrumentation/servlet-3/build.gradle#L7. Our unit-test suite probably has to be executed with the shaded classpath.

we must create a project that un-shades the dependency. Then, our custom instrumentation libraries can have a compile-time dependency on the unshaded artifact.

Again this approach feels weird. Unshading does not feel like a viable solution.

pavolloffay commented 3 years ago

When we upgraded to OTEL v1.3.1. our ApacheAsyncClientInstrumentationModule started having its test fail badly. There we some significant changes made to the underlying OTEL module in PR #3209. Notably, we no longer had an easy way to access the client Context as a downstream instrumentation library because the creation of that context was delayed until the request is generated. We put in a hack here, but we should ideally work with the upstream maintainers to re-expose the client context in some way.

I exposed the context in the previous version of the instrumentation https://github.com/open-telemetry/opentelemetry-java-instrumentation/pull/3209/files#diff-a329deb6c0f4476e0770642347664445d0882978808dee3fa11ef91034a67dbeL155. It's unfortunate that they removed it again. Exposing it from WrappedFutureCallback should be an easy fix in the upstream.

ryandens commented 3 years ago

I totally agree that this approach is not the long-term solution, and I agree that long term we can and should do better aligning with OTEL's patterns. My long-term plan is to get this project in a place where we can migrate each instrumentation module to the new pattern gradually without having to invest in a large risky reorganization of our entire project all at once. This is a stop-gap solution to get access to upstream changes in 1.3.1 that have some performance improvements we'd like to incorporate sooner rather than later.

pavolloffay commented 3 years ago

The point is why to diverge even more how if it can be aligned already in 1.3.x. Also, note there is 1.4.x out which might make custom distributions even easier to implement.

ryandens commented 2 years ago

closing in favor of #345