micrometer-metrics / micrometer

An application observability facade for the most popular observability tools. Think SLF4J, but for observability.
https://micrometer.io
Apache License 2.0
4.44k stars 979 forks source link

JvmThreadMetrics has performance issues with many threads #1805

Open lpatouchas opened 4 years ago

lpatouchas commented 4 years ago

Hello all, we have an application that by design will have a large number of threads. We use the micrometer library as it is (outside any framework ie, spring). We define a prometheus registry and our prometheus configuration is to poll every 15seconds. We see that when our app exceeds 10-12k threads the application and even tomcat that runs it freezes after startup. If we disable the prometheus polling or remove JvmThreadMetrics the problem does not occure.

The issue is quite easy to reproduce. A simple war that generates 15k running (or even TIMED_WAITING) threads and a curl to /metrics every 5 sec will freeze the app and tomcat immediately after startup (sometimes we might get a response after a while). /metrics, /health and not even http:/hostname:8080 that should be served tomcat respond.

Is this by design, is there a finite amount of threads that this registry supports?

Regards, Leonidas

checketts commented 4 years ago

That is not by design. Could you provide a sample project that reproduces it?

I suspect it is the jvm.threads.states metric since it iterates through all the threads serially (six times, one for each potential state).

If you could add a MeterFilter that disable the jvm.threads.states gauges let us know if that hang goes away.

dedousis commented 4 years ago

Hello, In our test case the Tomcat just freezes for a few seconds when we curl it every some seconds. Below i will attach the sample project we run the tests. https://github.com/dedousis/JvmThreadMetrics

Regards Andreas

mauhiz commented 4 years ago

+1. We observed heavy impact at 4K threads, with the metrics pull thread being busy at:

Screen Shot 2020-02-27 at 18 19 18
mauhiz commented 4 years ago

This JDK issue sounds related: https://bugs.openjdk.java.net/browse/JDK-8185005

shakuzen commented 4 years ago

Looks like the fix for that JDK issue has been backported to Java 11.0.7. Once that is released, could you see if that makes any difference? With so many threads, it may be best to not use the JvmThreadMetrics. Are you wanting the thread metrics? If not, you can either not bind them or filter them out with a MeterFilter.

shakuzen commented 2 years ago

Has anyone checked with a recent version of the JDK that includes the bug fix mentioned previously? Is the performance okay with that, or is it still an issue with enough threads?

SoMuchForSubtlety commented 2 years ago

Has anyone checked with a recent version of the JDK that includes the bug fix mentioned previously? Is the performance okay with that, or is it still an issue with enough threads?

I'm still seeing this issue with openjdk 11.0.15. jvm.threads is the worst offender, but process.files.open, process.cpu,usage and system.cpu.usage also have a fairly large impact.

Profiling results with ~150 active threads: image image