Closed robholland closed 2 years ago
Upon further investigation it seems that this is a timeout being set on the requests by the SDK rather than a metric issue. I've not yet been able to find where this is coming from.
Specifically I see GetWorkflowExecutionHistory requests never last longer than 20 seconds.
The 20 second timeout is expected behaviour for get history requests. The stalls in performance I saw were due to low task poller counts which meant the workers would often timeout waiting for empty task queue partitions.
Long poll requests definitely last longer than this under load so something is capping the metric, guessing something related to Trend/Time metrics in k6 -> prometheus.Seems that the SDK is timing out GetWorkflowExecutionHistory requests after 20 seconds. I think it should be 65, so not sure what's going on here.