Tracking

The original proposal is below in this description. This was accepted for proceeding on in Java SIG meeting July 20 2023, after example walkthroughs of instrumentation being moved are provided
A couple of examples of how the proposed changes would affect existing instrumentation, if moved to the new framework were provided for the following SIG meeting July 27 2023, and a decision to proceed on the proposal was then agreed
The proposed architecture outline and an implementation plan outline were added in this comment and PRs for the implementation began - a working POC is available for reference

Related PRs

Note not for inclusion in this issue are preferred post completion task to remove shading from the agent

[COMPLETED] Phase 1 - Enabling invoke dynamic capability for instrumentations

https://github.com/open-telemetry/opentelemetry-java-instrumentation/pull/9177
https://github.com/open-telemetry/opentelemetry-java-instrumentation/pull/9180
https://github.com/open-telemetry/opentelemetry-java-instrumentation/pull/9239
https://github.com/open-telemetry/opentelemetry-java-instrumentation/pull/9382
~Replace InstrumentationModule.isIndyModule() with a new interface in the internal package, per~ ~example~ - this is no longer being done as it was decided to leave the method as is for now and change or deprecate as desired (as is not yet InstrumentationModule stable so deprecation and change is relatively quick)

Phase 2a - Migrate simple instrumentation modules

Write migration guide (so we can link to it in reviews)
Any instrumentation that can be migrated without "Phase 2b features", - this is now detailed in this PR which identifies all simple and non-simple instrumentations

Phase 2b - Additional support for muzzle and complex instrumentation modules

muzzle runtime dual-parent verification that dependencies are present (currently only application classloader dependencies are checked, need to verify dependencies in both parents) -- https://github.com/open-telemetry/opentelemetry-java-instrumentation/pull/9729
muzzle compile-time reference/dependency gathering from static methods/fields (currently excluded)
muzzle/runtime compile/runtime enforcing non-inlined instrumentations only - reject instrumentations with inlined advices
muzzle add indy modules advices to helper classes
opening java 9+ modules mechanism for instrumentations that need specific module access -- donate LDAP instrumentation
member access support (to make package-private stuff accessible as advices are no longer inlined) -- https://github.com/open-telemetry/opentelemetry-java-instrumentation/pull/9729
support for SPI implementations -- https://github.com/open-telemetry/opentelemetry-java-instrumentation/pull/9502 -- https://github.com/open-telemetry/opentelemetry-java-instrumentation/pull/9565 -- https://github.com/open-telemetry/opentelemetry-java-instrumentation/pull/9752
Allow InstrumentationModules to optionally share their InstrumentationModuleClassloader (see this discussion)

Phase 3 - Migrate remaining instrumentation modules

migrate all remaining modules

Proposal

Contribute alternative invokedynamic based instrumentation mechanism to the OpenTelemetry Java agent, with no required changes to the existing OpenTelemetry instrumentation mechanism or instrumentations

Benefits to OpenTelemetry of this contribution and mechanism

Allows instrumentations to have breakpoints set in them and be debugged using standard debugging techniques
No shading required, allowing standard debugging in IDEs
Works cleanly with the Java Module system
Clean isolation of instrumentation advice from the application and other instrumentations
Uses the approach recommended by Byte Buddy creator Rafael Winterhalter (see background section below)
Makes available additional contributors/maintainers from Elastic for OpenTelemetry
It's a parallel instrumentation approach (to the existing OpenTelemetry Java agent) that would live alongside the existing one and not impact the existing approach at all
Implementors would have the option to use either instrumentation approach, whichever they prefer
It will subsequently allow existing instrumentations using this technology to be contributed to the OpenTelemetry agent
The contribution will not require any implementation effort from existing contributors (unless existing contributors want to become involved!) except for reviewing, as Elastic will make contributors available for this. (Elastic will also try to reduce reviewing effort by adding maintainers, but this will take time)
It is mature technology and has been production tested for several years
None of the advice class restrictions apply

Benefits to Elastic

It will allow Elastic Java agent to converge to the OpenTelemetry agent so that Elastic can support the OpenTelemetry Java agent

Background

The Elastic APM Java agent is fully open source on the Apache 2.0 license
The agent instrumentation uses Byte Buddy delegated advice (allows breakpoint setting) - the recommended approach by the byte buddy creator Rafael Winterhalter, which reduces agent implementation complexity
To support delegated advice, the agent has implemented custom classloaders that enable access and reversion (the instrumentation has correct isolated access, and can be unloaded and the transformed code fully reverted)
To support the delegated advice being able to access the instrumented methods runtime state, the agent uses Byte Buddy to insert an invokedynamic bytecode instructions -

the invokedynamic instruction can be used to call an advice method that is loaded from a child class loader of the instrumented class' defining class loader ... this allows the agent to hide its classes from the application while providing a way to invoke the isolated methods from the application classes it instruments ... also avoids injecting the advice and helper classes into the target class loader directly

https://www.elastic.co/de/blog/embracing-invokedynamic-to-tame-class-loaders-in-java-agents

Per discussion at the Java SIG, one next step is to provide a walkthrough of how an existing instrumentation is expected to change if it moved across to this mechanism after the mechanism was implemented

Per discussion at the Java SIG, the instrumentation using this mechanism is isolated and needs no shading but fully making the whole agent non-shaded using the ShadedClassLoader.java is not in this scope, but can be done in a subsequent scope if desired!

Per discussion at the Java SIG, one next step is to provide a walkthrough of how an existing instrumentation is expected to change if it moved across to this mechanism after the mechanism was implemented

This was added by @JonasKunz here providing a simple instrumentation change (Cassandra) and a complex one (Elasticsearch)

The next step is to provide a roadmap architecture of the implementation, following which we'll aim to start providing PRs of individual components using the architecture as a reference until we get to a fully working instrumentation. After that the intention is to convert all the existing instrumentations to the invokedynamic non-inlined implementation, targeting the 2.0 release in October, which will terminate support for extensions that use inline instrumentation capability

Here's a write-up of what we would envision as target architecture. In addition I did a working PoC showcasing what things would like in implementation, without all the bells and whistles of course. The PoC compiles and migrates the Cassandra-Module to invokedynamic. You can execute the tests and place a break-point in the Advice code.

Classloader Structure

Based on the experience with the elastic APM agent I would envision the following "ideal" classloader structure for the OpenTelemetry Agent:

otel_cl_ideal drawio

The image shows the agent alongside two separate classloaders of the application being instrumented (App A and App B). Arrow represent a child-to-parent relationship. Explanation:

The Agent Extension API CL loads all classes against which extensions are compiled, including for example the OTel API
The Agent CL is a child of the Extension-API CL. Those two are separated to ensure that implementation details of the Agent are hidden from extensions at runtime just like they are at compile time. This prevents extensions (especially external ones!) from accidentally relying on those implementation details.
Both the Agent Extension API CL and the Agent CL should follow the "self-first" instead of "parent-first" delegation model. This ensures that their classes cannot be shadowed by accident by the user, e.g. by providing bytebuddy as a dependency via -Xbootclasspath.
InstrumentationModules can share state and classes via the Global-State CL. This should avoid the need for InstrumentationModules to inject classes into the Bootstrap CL, properly isolating those shared classes from the application being instrumented.
The Global-State CL is separated from the Agent Extension API CL for easier debugging (e.g. understanding where a class comes from) and to make sure that InstrumentationModules cannot mess up the classpath of the Agent CL. The latter point should matter less due to the Agent CL following the self-first delegation model. The Global-State CL should be self-first aswell, so that classes from bootstrap cannot mess it up.
We create at most one Instrumentation Module CL per combination of Instrumentation-Module and application classloader. Whenever an Advice is applied, we lazily create the InstrumentationModule CL for the Instrumentation Module containing that Advice and the classloader containging the class being instrumented. If that CL already exists, it is reused.
The Instrumentation Module CL uses the following classloading delegation model:
- If the class to load is an Advice or a referenced helper class (and NOT a global state class), it is loaded by the Instrumentation Module CL in self-first manner. The helper classes are detected by muzzle.
- Next, the Global-State CL (and therefor its parents) are queried to find the class
- If still not found, the instrumented application classloader is queried
The classloading strategy of the Instrumentation Module CL ensures that application classes cannot shadow agent classes. We certainly need to add filtering for edge cases to this mechanism later (e.g. for bridging the opentelemetry-API from the application to the agent SDK).

To keep things simple for a start, we can keep the Agent CL, Agent Extension API CL and Global State CL fused together as a single Agent CL: This is how it is currently implemented in the elastic APM agent.

otel_cl_simplified drawio

This reduces the isolation benefits explained above, but should be simpler to implement for a start. If in the long run we often run into issues / user questions which would be avoided by the proper isolation, we can always add it afterwards.

Invokedynamic Advice Bootstraping

As discussed in the previous SIG meetings, we will add a switch-method to InstrumentationModules to allow them to be marked as compatible for the invokedynamic approach.

We will adapt the OpenTelemetry Java Agent instrumentation logic to respect this switch-method and insert invokedynamic instructions instead of inlining the Advice code. The inserted invokedynamic instructions need to point to a bootstraping method which returns the actual target of the invokedynamic instruction in the form of a MethodHandle.

We will inject an IndyBootstrapDispatcher into the Bootstrap CL to forward this call into the actual implementation within the Agent CL. This implementation takes care of setting up the corresponding InstrumentationModule CL.

In the elastic APM agent the IndyBootstrapDispatcher is the only class which is injected into the Bootstrap (or any application classloader). There we shade this class into the java.lang package to guarantee visibility, even if custom module systems, such as OSGi, are used.

AFAIK, the OpenTelemetry Java Agent uses classloader instrumentations to guarantee visibility of injected bootstrap classes. Therefore we won't need to shade the IndyBootstrapDispatcher here. If in the future we get rid of all other use-cases for class injection, we could consider shading the IndyBootstrapDispatcher into the java.lang package, as it would allow us to get rid of specialized classloader instrumentation for individual module systems.

Interaction of Invokedynamic-Advices with Java 9 modules

When instrumenting classes which are encapsulated within Java 9 modules, it is possible that the instrumentation needs to access non-exported contents of the modules. This usually shows by having to add --add-opens compiler flags when compiling the instrumentation with Java 9+.

In order to facilitate these use-cases, InstrumentationModules can declare modules they need to have access to. The invokedynamic bootstrapping will take care of providing the InstrumentationModule CL with the required access via Instrumentation.redefineModule.

Accessing package-private / private members

The inline/injection based instrumentation approach of the OpenTelemetry agent allows to access any package-private members by injecting accessor classes into the desired packages. Because we won't allow invokedynamic-instrumentations to perform this injection, we need a different way of gaining this access.

This can be efficiently done by combining Instrumentation.redefineModule with MethodHandles.privateLookupIn. For Java 8 we can simply use setAccessible(true). This allows us to add a MemberAccess utility to the extension API with which the advices can then simply declare MethodHandles for the required fields and methods:

 private static final MethodHandle fieldGetter = MemberAccess.fieldGetter(SomeSecret.class, "secretField");

Due to the MethodHandle being static final, the performance should be equivalent to direct field accesses / method calls after JIT.

You can find a PoC implementation here.

SPI injection

Another current use-case for injecting classes into application classloaders is to inject ServiceProviders. In order to provide a consistent development experience, we would need to find a way to make the service providers to "inject" actually live inside the InstrumentationModule CL. I see two alternative approaches for doing this:

Instrument the ServiceLoader methods to return instances from the InstrumentationModule CL
Inject a runtime generated proxy into the target-classloader and register it for the ServiceLoader. Upon creation, this proxy creates the actual implementation via invokedynamic in the InstrumentationModule CL and delegates all calls to that instance.

Implementation Plan

Based on the outlined architecture above, I would suggest the following implementation order:

Implement the InstrumentationModuleClassloader
Adapt muzzle to include the Advices as helper classes for invokedynamic-modules
Add the invokedynamic bootstraping and instrumentation logic
Migrate a few, simple instrumentation modules to verify the mechanism
Add the module-opening mechanism and contribute the LDAP instrumentation module to showcase it
Implement the MemberAccess utility explained above and migrate at least one instrumentation modules which requires it
Decide on a way to support SPI injection and implement it. Migrate at least one instrumentation module using it.

open-telemetry / opentelemetry-java-instrumentation

Proposal to contribute invokedynamic based instrumentation mechanism #8999