micrometer-metrics / micrometer

An application observability facade for the most popular observability tools. Think SLF4J, but for observability.
https://micrometer.io
Apache License 2.0
4.49k stars 993 forks source link

Virtual thread metrics #3956

Closed shakuzen closed 2 months ago

shakuzen commented 1 year ago

Follow-up from #3650. See https://openjdk.org/jeps/444#JDK-Flight-Recorder-JFR

Existing thread metrics do not cover virtual threads - ThreadMXBean only provides statistics on platform threads. New potential instrumentation of virtual threads is only available via JFR. We can count started/finished virtual threads. The other events available are:

  • jdk.VirtualThreadPinned indicates that a virtual thread was pinned (and its carrier thread wasn’t freed). This event is enabled by default with a threshold of 20 ms.
  • jdk.VirtualThreadSubmitFailed indicates that starting or unparking a virtual thread failed, probably due to a resource issue. Parking a virtual thread releases the underlying carrier thread to do other work, and unparking a virtual thread schedules it to continue. This event is enabled by default.
juliojgd commented 10 months ago

Any developments here?

jonatan-ivanov commented 9 months ago

Not much I guess. Since this is not available through JMX (like metrics for platform threads are), we need to utilize JFR and jdk.VirtualThreadStart/jdk.VirtualThreadEnd events (disabled by default) which might be a bit more effort. Are you interested in submitting a PR?

juliojgd commented 9 months ago

I am not very familiar with JFR but, in order to retrieve the events count, shouldn't we start and stop a JFR recording? Or even keeping an active recording for the whole life of the JVM?

My knowledge about JFR is not enough to contribute this PR. Anyway I will try to find if in Java 22 or beyond there is any plan or JEP to enhance ThreadMXBean (or create a specific one for VT's) that could ease the retrieval of this measures.

jonatan-ivanov commented 9 months ago

I am not very familiar with JFR but, in order to retrieve the events count, shouldn't we start and stop a JFR recording?

I need to look into this but not necessarily in the classic sense where JFR is writing the data into a file. What we need is receiving the JFR events, we need to investigate how to do it so that it does not disturb the app.

Anyway I will try to find if in Java 22 or beyond there is any plan or JEP to enhance ThreadMXBean (or create a specific one for VT's) that could ease the retrieval of this measures.

Having this info coming from ThreadMXBean would be the best since we are already interacting with it for platform threads.

ArtyomGabeev commented 6 months ago

Hello here,

Let me know if you want me try and contribute support of VT metrics based on JFR continuous event listener.

Or are we waiting this data to be exposed via jmx?

shakuzen commented 6 months ago

We would consider an implementation based on the JFR events. I am not aware of any plans for anything to be added to JMX, so I don't think just waiting is necessarily likely to yield results. We could ask on the corresponding JDK mailing list about adding support for virtual thread metrics to JMX if there's a reason that's better over the JFR event route.

It would also be good for users to share what virtual thread metrics in particular they are interested in and why so we can understand what will be most helpful and why.

ArtyomGabeev commented 6 months ago

I think we may be interested in two JFR events:

  1. VirtualThreadPinnedEvent - expose as a time, with pinning duration
  2. VirtualThreadSubmitFailedEvent - expose as a counter

For the first one, things to consider:

  1. Add a tag with stackframe where pinning happened? e.g.: com.test.ThreadPinningExample.sleep(Duration)
  2. Additionally, I would like to be able to perform log once with extended strack trace, but I'm not sure this is a responsibility of micrometer library.

Here is an example of VirtualThreadPinnedEvent:

jdk.VirtualThreadPinned {
  startTime = 16:58:30.051 (2024-04-14)
  duration = 251 ms
  eventThread = "web-vt-729a03c1-5456-4686-90b2-b2659291752f" (javaThreadId = 69, virtual)
  stackTrace = [
    java.lang.VirtualThread.parkOnCarrierThread(boolean, long) line: 677
    java.lang.VirtualThread.parkNanos(long) line: 636
    java.lang.VirtualThread.sleepNanos(long) line: 793
    java.lang.Thread.sleep(long) line: 507
    com.test.ThreadPinningExample.sleep(Duration) line: 36
    ...
  ]
}
ArtyomGabeev commented 6 months ago

I've opened draft PR to demonstrate intentions and come up with the question:

I think this instrumentation should be specific to java-21. Should I go and create a java-21 source set or separate gradle module?

ArtyomGabeev commented 6 months ago

I'm not sure we need to process jdk.VirtualThreadStart and jdk.VirtualThreadEnd events. E.g. in virtual thread per model request, amount created events equals to rps, which can be achieved via other ways. If virtual threads used as part of executor, there is already exist intsrumentation - ExecutorServiceMetrics

celtric commented 3 months ago

In a recent investigation I used this hacky approach:

final Class<?> virtualThreadClass = Class.forName("java.lang.VirtualThread");
final Field defaultSchedulerField = virtualThreadClass.getDeclaredField("DEFAULT_SCHEDULER");
defaultSchedulerField.setAccessible(true);
final ExecutorService defaultScheduler = (ExecutorService) defaultSchedulerField.get(null);
new ExecutorServiceMetrics(defaultScheduler, "virtualThreads").bindTo(metricsRegistry);

It did help me observed thread pinning build-up, and as such I'm debating whether to keep it or not even though it's "illegal", but I understand that there's still no official support for virtual thread metrics.

shakuzen commented 3 months ago

@celtric Thanks for sharing. I guess you only need to do that because virtual threads are not being created from an executor you can configure in your application. Does that mean virtual threads are only being created like Thread.ofVirtual() and not with a Executors.newVirtualThreadPerTaskExecutor() executor in your application? I'm wondering how common this will be in practice and the need to monitor this usage for pinning with metrics.

celtric commented 2 months ago

@celtric Thanks for sharing. I guess you only need to do that because virtual threads are not being created from an executor you can configure in your application. Does that mean virtual threads are only being created like Thread.ofVirtual() and not with a Executors.newVirtualThreadPerTaskExecutor() executor in your application? I'm wondering how common this will be in practice and the need to monitor this usage for pinning with metrics.

Actually I'm creating the executor myself with Executors.newVirtualThreadPerTaskExecutor(), but if I register that directly it doesn't seem to work for some reason. As a note, I'm passing it to to Jetty 12's QueuedThreadPool, which may have something to do with it, but I haven't had time to dig into it.

juliojgd commented 2 months ago

@shakuzen maybe here is a misunderstanding, what @celtric is getting via reflection and instrumenting is the underlying ForkJoinPool executor that internally the JVM is using to schedule and run the virtual threads. It's different from the "Executors.newVirtualThreadPerTaskExecutor()" that returns a user-facing executor that submits every task inside a new virtual thread.

celtric commented 2 months ago

As a note, if I try to register Executors.newVirtualThreadPerTaskExecutor() I get:

...ExecutorServiceMetrics] Failed to bind as java.util.concurrent.ThreadPerTaskExecutor is unsupported.

As @juliojgd mentions, I'm registering the ForkJoinPool used by the VTs. That registration succeeds in the second if branch in:

image

shakuzen commented 2 months ago

Thank you all for the additional details. It seems we can't rely on wrapping the Executors. newVirtualThreadPerTaskExecutor with ExecutorServiceMetrics as I hoped we could (at least not with the existing implementation). We'll have to figure out what is best to do for this. One option is timing task execution with the JFR start and stop events. Is the timing of task execution what people are most interested in or some other metrics for VirtualThreadPerTaskExecutor?

As a note, I'm passing it to to Jetty 12's QueuedThreadPool

In case you aren't aware, we have JettyServerThreadPoolMetrics for instrumenting QueuedThreadPool.

shakuzen commented 2 months ago

@celtric I've opened a separate issue - see #5488