temporalio / features

Behavior and history compatibility testing for Temporal SDKs
13 stars 16 forks source link

[Feature Request] Add temporal_worker_task_slots_total metric #392

Open ghaskins opened 8 months ago

ghaskins commented 8 months ago

Is your feature request related to a problem? Please describe.

It's harder to build meaningful visualizations on temporal_worker_task_slots_available alone when the total amount is unknown.

Most of the time, the graphs look like this

Screenshot 2023-12-30 at 9 00 12 AM

It's particularly hard to see nodes with smaller task slot allocations....and these are precisely the candidates likely to run into resource exhaustion.

Describe the solution you'd like

Providing a means to retrieve the total possible slots would make it trivial to graph the utilization percent using an 'available/total' approach. This would make it easier to observe the cases that require the most attention by better visual emphasis and alerting targets.

cretz commented 8 months ago

when the total amount is unknown.

But since you set this value, it is known. You can expose this fixed value as a metric if you'd like.

Quinn-With-Two-Ns commented 8 months ago

I think the SDK exposing this as a metric makes sense given these feature requests https://github.com/temporalio/features/issues/334 https://github.com/temporalio/features/issues/388

ghaskins commented 8 months ago

But since you set this value, it is known.

Well, it's known somewhere. But it's not conveniently accessible to metrics/dashboard consumers, making developing things like dashboards more fragile instead of generic.