micrometer-metrics / micrometer

An application observability facade for the most popular observability tools. Think SLF4J, but for observability.
https://micrometer.io
Apache License 2.0
4.46k stars 989 forks source link

Jvm is faked death (when micrometer collectting thead info in native) #2832

Closed fly7632785 closed 2 years ago

fly7632785 commented 3 years ago

Describe the bug Hello, we got a problem, our service is not responsed. Need help, thank you.

After using micrometer for a period of time, service is not reponsing. http://xxxxx/actuator/prometheus is not responsing, too.

image So, can't continue to collect data

And our any request is not responsing,too. Cause our project system is not useful.

We dump javacore.txt to analysis, but not figure out anything useful. Just find so many threads which are runnable and related to micrometer and promethues

image

image

We try use jstat jstack to anylysis, but jstack -l xx is not responsing, too. So we have to kill -3 xx and get javacore.txt to analysis.

This is javacore.txt,hope some people help, thank you in advance! We are not so professional that we can't find the reason

javacore.txt

Environment

<!--        <dependency>-->
<!--            <groupId>org.springframework.boot</groupId>-->
<!--            <artifactId>spring-boot-starter-actuator</artifactId>-->
<!--            <version>2.5.5</version>-->
<!--        </dependency>-->

<!--        <dependency>-->
<!--            <groupId>io.micrometer</groupId>-->
<!--            <artifactId>micrometer-core</artifactId>-->
<!--            <version>1.6.9</version>-->
<!--        </dependency>-->

<!--        <dependency>-->
<!--            <groupId>io.micrometer</groupId>-->
<!--            <artifactId>micrometer-registry-prometheus</artifactId>-->
<!--            <version>1.6.9</version>-->
<!--        </dependency>-->
#management:
#  metrics:
#    export:
#      prometheus:
#        enabled: true
#    tags:
#      application: ${spring.application.name}
#  endpoints:
#    web:
#      exposure:
#        include: "*"
#  endpoint:
#    prometheus:
#      enabled: true
#    metrics:
#      enabled: true

PS:Our project may use too many thread pools and theads, don't know if it's related.

checketts commented 3 years ago

To ensure that Micrometer is the culprit, does the JVM stay responsive when disabling the actuator?

I noticed the com.ibm in the stack trace, do you get the same hangs when running on OpenJDK?

I seem to recall there beeing a performance impact trying to list the threadstates. You might consider disabling the JvmThreadMetrics (by disabling the JvmMetricsAutoConfiguration)

Let me know how that behaves.

fly7632785 commented 3 years ago

@checketts 1、It's normal when we remove the Micrometer and let it run for a period of time. So I think it maybe related to the Micrometer. 2、Yes, we use OpenJ9 JVM, but we ruled out this factor. Using normal JAVA 8 Hotpot JVM, it still happen. 3、we will try to disable the ThreadMetrics, see what it perform. thank you for your reply.

shakuzen commented 3 years ago

PS:Our project may use too many thread pools and theads, don't know if it's related.

Yes, I suspect the issue is the thread state metrics. As @checketts suggested, if you try with not binding JvmThreadMetrics it should help us narrow down the issue.

fly7632785 commented 2 years ago

Hi, I tried the solution that you provided and disabled JvmThreadMetrics. Running for a period time(from then to now ), it works well. The NOT RESPONSE haven't appeared yet. Maybe it is related to The JvmThreadMetrics. I will continue to observe @shakuzen @checketts

shakuzen commented 2 years ago

Duplicate of #1805