micrometer-metrics / micrometer

An application observability facade for the most popular observability tools. Think SLF4J, but for observability.
https://micrometer.io
Apache License 2.0
4.49k stars 992 forks source link

Suspected memory leak #3191

Closed shiyanhui closed 2 years ago

shiyanhui commented 2 years ago

Describe the bug

Hi, recently we observed that our service would make major gc collection frequently after running for days. And we found that it may be related to memory leak of micrometer.

Here are three screenshots, the first is the datadog JVM heap runtime metrics, the second is the datadog thrown profiling, and the third is the heap histogram with command jmap -histo:live.

Screen Shot 2022-05-21 at 09 59 20 Screen Shot 2022-05-21 at 09 44 06 Screen Shot 2022-05-21 at 09 27 27

All the thrown exceptions shown in the second screenshot point to ResourceLeakDetector. We traced the code and suspected that some buffer objects it requested here NettyOutbound.java may not be released.

We are using an old version micrometer 1.5.14, so is it known issue and fixed in later versions? If not, can you please help to give some advice? Thanks.

Environment

To Reproduce

The Old Gen Size grows slowly, so it will take days to reproduce this on our k8s. We didn't built a minimum app to reproduce it locally.

Expected behavior

Make sure that there are no memory leak for micrometer.

checketts commented 2 years ago

This sort of behavior is typically due to 'tag cardinality explosion'. See #3038

Can you verify that you don't have metrics with too many tags?

shiyanhui commented 2 years ago

Yeah, we do have metrics with multiple tags. Seems we likely find the root cause. Will fix it and see whether back to normal. Thanks!

shiyanhui commented 2 years ago

Another question is, it seems that only when we use Spring WebFlux this problem will appear. Not if we use Spring MVC. So is there any way to make micrometer not use reactor nor reactor-netty ?

jonatan-ivanov commented 2 years ago

As @checketts mentioned above, this is probably caused by using high cardinality tags, the jmap output seems to agree with this, see the Tag and Meter.Id, you have almost 6 million tags (Tag is part of the Id). Using multiple tags is not an issue, using a tag that has high cardinality, is.

Seems we likely find the root cause.

Can I ask what was it?

it seems that only when we use Spring WebFlux this problem will appear. Not if we use Spring MVC.

This is pretty weird, are you sure you don't attach high cardinality tags in one case and do in the other?

So is there any way to make micrometer not use reactor nor reactor-netty ?

I'm not sure I understand how would this help. Micrometer only uses them for its StatsD registry and they are shadowed so theoretically you should not notice it. Can you try using DataDog through the DatadogMeterRegistry? That one does not use reactor or reactor-netty.

shakuzen commented 2 years ago

We are using an old version micrometer 1.5.14, so is it known issue and fixed in later versions?

I can't think of a specific issue, but there have been various changes in the statsd module since that version, so trying with the latest version available to see if the issue remains is a good troubleshooting step.

shiyanhui commented 2 years ago

Problem solved, just want to sync the result here. It's caused by high cardinality tags in reactor netty. So this issue can be closed now, thanks!

jonatan-ivanov commented 2 years ago

@shiyanhui Were those tags attached by you or by rector/reactor-netty/netty?

afosorio commented 3 months ago

@jonatan-ivanov Apparently only reactor-netty adds them, but I don't know how he solved it

jonatan-ivanov commented 3 months ago

Me neither, but I would report that issue to reactor-netty. Also, you can always, remove/modify add tags using a MeterFilter/ObservationFilter.

afosorio commented 3 months ago

@jonatan-ivanov Is it netty that generates high cardinality? Would deleting the library in the pom be enough?

afosorio commented 3 months ago

image

@jonatan-ivanov I removed these tags but still, the memory consumption is very high

jonatan-ivanov commented 3 months ago

I think Netty is not instrumented (so it can't produce any high cardinality data) and you also said it's reactor-netty that does. :) Did you check it or just read this comment? If it produces data, it means it is used so if you remove it from your dependencies your app might be broken but I don't know how reactor-netty is used in your app.

I'm not sure what you expect from the config changes you made above. They are not tags and have nothing to do with netty or reactor netty. I would report the issue to reactor-netty and in the meantime use a MeterFilter/ObservationFilter as I suggested above.