micrometer-metrics / micrometer

An application observability facade for the most popular observability tools. Think SLF4J, but for observability.
https://micrometer.io
Apache License 2.0
4.45k stars 978 forks source link

Performance regression in `MeterRegistry#remove` with many meters #5466

Open joshb1050 opened 1 week ago

joshb1050 commented 1 week ago

Describe the bug A clear and concise description of what the bug is.

We noticed a performance regression related to #4857. Specifically, with a lot of meters, this preFilterIdToMeterMap map could be hundreds of thousands of elements or more, and we linearly iterate through them.

Environment

Micrometer 1.13.2, able to produce on macOS Sonoma 14.2.

This was discovered after it was bumped in ActiveMQ Artemis.

openjdk 17.0.11 2024-04-16 LTS
OpenJDK Runtime Environment Zulu17.50+19-CA (build 17.0.11+9-LTS)
OpenJDK 64-Bit Server VM Zulu17.50+19-CA (build 17.0.11+9-LTS, mixed mode, sharing)

To Reproduce How to reproduce the bug:

Unfortunately don't have a MRE to share, however this can likely be seen if creating many meters and gauges and then trying to delete them.

Expected behavior A clear and concise description of what you expected to happen.

Additional context Add any other context about the problem here, e.g. related issues.

shakuzen commented 1 week ago

Thank you for the report and sorry about the performance regression. I'm not sure what will be the best solution for this. I think we were somewhat aware this would not be very performant for remove calls but took the assumption those should be far fewer than register calls, and remove calls should generally not be on the hot path for applications. I suspect that's still true, but in cases where remove is called more often, wasting a lot of CPU cycles and locking the meter map still hurts overall performance. /cc @lenin-jaganathan

jonatan-ivanov commented 1 week ago

Could you please share your use case? Especially why you have hundreds of thousands meters that you remove?

jbertram commented 1 week ago

ActiveMQ Artemis is a message broker and we register & remove meters as part of a queue's lifecycle (among other things). Currently each & every queue has 17 corresponding meters. In certain use-cases (e.g. heavy publish/subscribe) queues can come and go a lot and there can be a lot of them. In order to maintain consistency for internal data structures we have some concurrency controls for these processes and registering & removing meters is part of that. This means that slow meter removal is impacting performance (e.g. blocking other threads from removing or adding new queues and meters). We can certainly investigate moving meter removal outside of these concurrency controls, but it would be awesome if performance was improved within Micrometer so that wasn't necessary.

jbertram commented 1 week ago

I just wanted to confirm no further explanation is needed in terms of the use-case.

shakuzen commented 1 week ago

The use case makes sense. Thank you for the explanation. We have a similar thing happening with Kafka metrics as well. We'll have to see what we can do to improve this.