[Feature Request] poll_workflow_task_queue retried X times needs to be an ERROR log, not WARN

temporalio / sdk-core

Core Temporal SDK that can be used as a base for language specific Temporal SDKs

MIT License

262 stars 70 forks source link

2024-03-13T14:27:20.049568Z WARN temporal_client::retry: gRPC call poll_workflow_task_queue retried 16 times error=Status { code: Cancelled, message: "Timeout expired", source: Some(tonic::transport::Error(Transport, TimeoutExpired(()))) }

We have recently observed a case where a client had retried nearly 100,000 times and still no error was reported:

temporal_client::retry: gRPC call poll_activity_task_queue retried 94372 times

Even querying the worker's state didn't indicate any problem, so the service's heartbeat (which checks worker state) also didn't fail:

{"runState":"RUNNING","numHeartbeatingActivities":0,"workflowPollerState":"SHUTDOWN","activityPollerState":"POLLING","hasOutstandingWorkflowPoll":false,"hasOutstandingActivityPoll":true,"numCachedWorkflows":0,"numInFlightWorkflowActivations":0,"numInFlightActivities":0}

(Note: This service only handles activities, so workflowPollerState SHUTDOWN was expected.)

temporalio / sdk-core

[Feature Request] poll_workflow_task_queue retried X times needs to be an ERROR log, not WARN #704