micrometer-metrics / micrometer

An application observability facade for the most popular observability tools. Think SLF4J, but for observability.
https://micrometer.io
Apache License 2.0
4.5k stars 994 forks source link

`executor.queued` metrics of ForkJoinPool does not include queued submissions #5650

Open tommyk-gears opened 3 weeks ago

tommyk-gears commented 3 weeks ago

Please describe the feature request. A clear and concise description of what you would like to be able to do with Micrometer and cannot currently. In io.micrometer.core.instrument.binder.jvm.ExecutorServiceMetrics#monitor(io.micrometer.core.instrument.MeterRegistry, java.util.concurrent.ForkJoinPool) the executor.queued metrics is bound to ForkJoinPool::getQueuedTaskCount.

The javadoc for ForkJoinPool::getQueuedTaskCount states:

Returns an estimate of the total number of tasks currently held in queues by worker threads (but not including tasks submitted to the pool that have not begun executing). [...]

Now, in ForkJoinPool there is aslo getQueuedSubmissionCount() with this javadoc;

Returns an estimate of the number of tasks submitted to this pool that have not yet begun executing. [...]

The current bindings completely ignores the getQueuedSubmissionCount(). One solution might be to just bind executor.queued to the sum of getQueuedTaskCount() and getQueuedSubmissionCount(). Another solution might be to have them both bound to executor.queued but with different additional tags (but that might be considered a breaking change, and align badly with metrics on other types of ExecutorServices?).

Rationale In my opinion it is more interesting to monitor the queued submissions than the queued tasks (in ForkJoinPool terminology) since when the pool threads are not able to keep up with the submission, they will pile up in queued submissions (but the number of queued tasks remain at 0 in my experience).

shakuzen commented 2 days ago

Thank you for the report. I suspect it was an oversight that getQueuedSubmissionCount was not considered. I think the way to do this in a non-breaking way would be to leave the current executor.queued metric as is and introduce two new ones (say executor.queued.submissions and executor.queued.tasks) that correspond to the two different methods. In a future major version, we could remove the executor.queued metric or perhaps go the route of differentiating by a tag. Would these two values be summed typically when monitoring a FJP?

tommyk-gears commented 2 days ago

Would these two values be summed typically when monitoring a FJP?

I think so, yes. At least from my point of view it doesn't make much difference if an item is in a worker thread queue or if it is still just a "submission".

I am not at all sure that there is a need to have separate metrics for tasks vs submissions. I think I'd opt to just summarise them and keep the executor.queued name (I like the fact that the metrics naming is consistent regardless of executor type).

shakuzen commented 1 day ago

If we could go back in time and do it properly, I think it would be best to use executor.queued with a tag to distinguish queued submissions from in-progress tasks that are queued - that way users could see each value and sum them in their metrics backend. But we have to consider the current state of things and what potentially breaking effect any changes we make would have on users. We should consider what queries and dashboards and alerts users may have based on this metric currently, and how those would be affected by any change we consider.

Currently

executor.queued = ForkJoinPool::getQueuedTaskCount

Proposals

Putting some proposals down for discussion.

1

executor.queued = ForkJoinPool::getQueuedTaskCount + ForkJoinPool::getQueuedSubmissionCount

This won't break any queries/dashboards per se, but it does change the meaning of the metric, and previously set alert thresholds may not be appropriate after this change. It can be hard for users to be aware of such a change in meaning of the metric, so it may surprise some users, even though many/most may be unaffected.

2

(We can further discuss the tag name/value if we go with this proposal; I'm just putting something down for now to show the idea) executor.queued[type=task] = ForkJoinPool::getQueuedTaskCount executor.queued[type=submission] = ForkJoinPool::getQueuedSubmissionCount

Depending on the metrics backend and how the query is written, this could have a similar effect to proposal 1, while making it possible to query the same info as before. It would help to get some feedback on what kind of queries on this metric users are currently using to know the impact.

As @tommyk-gears pointed out, this would be a departure from how the metric executor.queued is implemented for ThreadPoolExecutor. This has implications for the Prometheus limitation on metrics with the same name but a different set of tags.

3

executor.queued = ForkJoinPool::getQueuedTaskCount (unchanged; duplicated with executor.queued.tasks for backward compatibility) executor.queued.tasks = ForkJoinPool::getQueuedTaskCount executor.queued.submissions = ForkJoinPool::getQueuedSubmissionCount

This leaves the existing metric unchanged, which will not break any users while introducing two new metrics that can be used separately or summed. The duplication of two metrics with the same meaning is wasteful and unfortunate, but that's the cost of making the change completely backward compatible. An alternative to this proposal may be to make a new single metric name that differentiates the two values by tag.