Open trask opened 10 months ago
Here's my understanding on this topic, I've implemented the virtual thread support in the elastic apm agent.
The usage of thread locals prior to virtual threads can be roughly divided into two categories:
ThreadLocal
is used as a way of implicitly passing values up or down the callstack. E.g. the OpenTelemetry Context
is a prime example for this usecase.ThreadLocal
is used to pool an expensive, non-thread safe resource.The first use case is still fine with virtual threads: It is just a necessary tool and the lifetime of the thread local is usually limited (e.g. via the closeable Scope
).
The second use case is the one which can be problematic. A good example is the TemporaryBuffers class from the SDK. The buffer is allocated on first usage in a Thread and from then on lives until the thread dies.
With non-virtual threads this was super nice, especially because your typical web application would use a thread-pool of worker threads. After every thread in the pool has been used, TemporaryBuffers
would be allocation free because the ThreadLocal
has been initialized for every pooled thread.
With virtual threads the intended usage would be to not pool threads, but create a new virtual thread for each request.
This means that for each request a new TemporaryBuffer
is allocated and first usage. It could only be reused on further usage within the same request (=the same virtual thread). If your requests are short lived, this means that the pooling is somewhat ineffective but the memory consumption would still be okay.
Where things get really bad is if the ThreadLocal
unnecessarily prolongs the lifetime of the pooled object.
The best example here imo is a Java service uses as HTTP Gateway/Proxy. Previously you would typically use reactive programming here, whereas with virtual threads you can now easily go back to the thread-per-request model.
Imagine an HTTP Gateway which just forwards the HTTP requests to other services which take 10s on avergage.
With 10k requests per second, this means that at any given time 100k requests are served concurrently, equating to 100k virtual threads mostly waiting on the response of the downstream HTTP service.
If every virtual thread holds on to a 4kb TemporaryBuffer
, this gives you about 400mb wasted heap memory.
char[]
pools which were only used once per handled HTTP request IIRC ThreadLocals
to do this. This is not free though, as the thread-safe management usually requires atomic operations when acquiring / releasing Objects.
MpmcAtomicArrayQueue
from JCToolsThreadLocal
pooling if the pooled resource is very lightweight AND used very frequently, even during a single request
WeakConcurrentMap
(GH). That map requires a special key object for lookups with overidden hashcode()
and equals()
. In order to not always having to allocate those, this special key object is stored in a ThreadLocal
and reused for every get()
on the map. Because that object is both very small and used very heavily, we stayed with a ThreadLocal
in this case.In my opinion it would make sense to go through both the Otel Agent and the SDK and to look at all usages of ThreadLocals.
If they are used for pooling decide which of the proposed alternatives would be the best fit
We (elastic) can definitely contribute the object pooling implementation and it's JMH benchmark suite to Otel if desired. We've been using this pooling implementation for a few years now to minimize allocation rates prior to virtual threads, so it is battle tested.
The newly added Executors are [Executors.newVirtualThreadPerTaskExecutor()](https://docs.oracle.com/en/java/javase/21/docs/api/java.base/java/util/concurrent/Executors.html#newVirtualThreadPerTaskExecutor()) and Executors.newThreadPerTaskExecutor(ThreadFactory). I think the former builds on the latter. Chances are high that our instrumentation already covers these, so maybe just adding tests for those should be good enough. This is at least how it was for the elastic agent.
I looked for all usages of ThreadLocal
in the API, SDK and Agent and tried to classify for which usages action might be required for handling virtual threads nicely.
API: TemporaryBuffers
PercentEscaper
. The two atomic operations for pool management shouldn't have too much overhead in relation to the percent escaping logic. It is only on a slow path anywaySDK: JdkHttpSender
ThreadLocal<ByteBufferPool>
is very heavy-weightSDK: ProtoSerializer
ThreadLocal<OutputStreamEncoder>
with 50kb bufferThreadLocal<Map<String, byte[]>> THREAD_LOCAL_ID_CACHE
is fine because it is cleared on ProtoSerializer.close()
ThreadLocal
s are fine.
For the agent I think no changes are needed, all usages seem fine to me!
I checked the following usages across API, SDK and Agent but classified them as non-problematic for virtual threads:
API:
SDK:
ThreadLocal<LookupKey<?>>
Agent:
ThreadLocal
with a short lifespan (e.g. ThreadLocal
is cleared after an instrumented method ends):
ThreadLocal<Context>
ThreadLocal
which are not cleared before thread death, but the objects are very small:
ThreadLocal<Map<MessageChannel, ContextAndScope>>
ThreadLocal<SqlConnectOptions>
ThreadLocal<boolean[]>
ThreadLocal<DefineClassContextImpl>
ThreadLocal
usages in the agent are fineTemporaryBuffers
usage by PercentEscaper
.
Some Feedback on whether you agree / disagree would be great!
Virtual threads are out of preview and included in new Java 21 (LTS) release (https://openjdk.org/jeps/444).
Notes/references from weekly Java meeting: