open-telemetry / opentelemetry-java-instrumentation

OpenTelemetry auto-instrumentation and instrumentation libraries for Java
https://opentelemetry.io
Apache License 2.0
1.84k stars 811 forks source link

Support for virtual threads (Java 21) #9534

Open trask opened 10 months ago

trask commented 10 months ago

Virtual threads are out of preview and included in new Java 21 (LTS) release (https://openjdk.org/jeps/444).

Notes/references from weekly Java meeting:

JonasKunz commented 10 months ago

Here's my understanding on this topic, I've implemented the virtual thread support in the elastic apm agent.

Thread Locals

The usage of thread locals prior to virtual threads can be roughly divided into two categories:

The first use case is still fine with virtual threads: It is just a necessary tool and the lifetime of the thread local is usually limited (e.g. via the closeable Scope).

The second use case is the one which can be problematic. A good example is the TemporaryBuffers class from the SDK. The buffer is allocated on first usage in a Thread and from then on lives until the thread dies.

With non-virtual threads this was super nice, especially because your typical web application would use a thread-pool of worker threads. After every thread in the pool has been used, TemporaryBuffers would be allocation free because the ThreadLocal has been initialized for every pooled thread.

With virtual threads the intended usage would be to not pool threads, but create a new virtual thread for each request. This means that for each request a new TemporaryBuffer is allocated and first usage. It could only be reused on further usage within the same request (=the same virtual thread). If your requests are short lived, this means that the pooling is somewhat ineffective but the memory consumption would still be okay.

Where things get really bad is if the ThreadLocal unnecessarily prolongs the lifetime of the pooled object. The best example here imo is a Java service uses as HTTP Gateway/Proxy. Previously you would typically use reactive programming here, whereas with virtual threads you can now easily go back to the thread-per-request model.

Imagine an HTTP Gateway which just forwards the HTTP requests to other services which take 10s on avergage. With 10k requests per second, this means that at any given time 100k requests are served concurrently, equating to 100k virtual threads mostly waiting on the response of the downstream HTTP service. If every virtual thread holds on to a 4kb TemporaryBuffer, this gives you about 400mb wasted heap memory.

Alternatives

Suggestion for next steps

In my opinion it would make sense to go through both the Otel Agent and the SDK and to look at all usages of ThreadLocals.

Executors

The newly added Executors are [Executors.newVirtualThreadPerTaskExecutor()](https://docs.oracle.com/en/java/javase/21/docs/api/java.base/java/util/concurrent/Executors.html#newVirtualThreadPerTaskExecutor()) and Executors.newThreadPerTaskExecutor(ThreadFactory). I think the former builds on the latter. Chances are high that our instrumentation already covers these, so maybe just adding tests for those should be good enough. This is at least how it was for the elastic agent.

JonasKunz commented 10 months ago

I looked for all usages of ThreadLocal in the API, SDK and Agent and tried to classify for which usages action might be required for handling virtual threads nicely.

Usages might needing work

API: TemporaryBuffers

SDK: JdkHttpSender

SDK: ProtoSerializer

For the agent I think no changes are needed, all usages seem fine to me!

Usages looking fine to me

I checked the following usages across API, SDK and Agent but classified them as non-problematic for virtual threads:

API:

SDK:

Agent:

TL;DR

Some Feedback on whether you agree / disagree would be great!