temporalio / features

Behavior and history compatibility testing for Temporal SDKs
14 stars 17 forks source link

[Feature Request] Add temporal_worker_task_slots_total metric #392

Open ghaskins opened 11 months ago

ghaskins commented 11 months ago

Is your feature request related to a problem? Please describe.

It's harder to build meaningful visualizations on temporal_worker_task_slots_available alone when the total amount is unknown.

Most of the time, the graphs look like this

Screenshot 2023-12-30 at 9 00 12 AM

It's particularly hard to see nodes with smaller task slot allocations....and these are precisely the candidates likely to run into resource exhaustion.

Describe the solution you'd like

Providing a means to retrieve the total possible slots would make it trivial to graph the utilization percent using an 'available/total' approach. This would make it easier to observe the cases that require the most attention by better visual emphasis and alerting targets.

cretz commented 10 months ago

when the total amount is unknown.

But since you set this value, it is known. You can expose this fixed value as a metric if you'd like.

Quinn-With-Two-Ns commented 10 months ago

I think the SDK exposing this as a metric makes sense given these feature requests https://github.com/temporalio/features/issues/334 https://github.com/temporalio/features/issues/388

ghaskins commented 10 months ago

But since you set this value, it is known.

Well, it's known somewhere. But it's not conveniently accessible to metrics/dashboard consumers, making developing things like dashboards more fragile instead of generic.