temporalio / sdk-python

Temporal Python SDK
MIT License
474 stars 77 forks source link

[Bug] Worker hangs after polling error #459

Closed ndtretyak closed 4 months ago

ndtretyak commented 10 months ago

Describe the bug

Worker hangs after an error during polling.

Minimal Reproduction

I just start a worker with an incorrect token, so that my server responds with Request unauthorized..

import asyncio

from temporalio import activity, client, workflow, worker

@activity.defn
async def a() -> None:
    pass

@workflow.defn
class Workflow:
    @workflow.run
    async def run(self) -> None:
        pass

async def main():
    c = await client.Client.connect(
        "my_temporal_host:7233",
        rpc_metadata={"authorization": "wrong_token"},
        tls=True,
    )
    w = worker.Worker(
        c,
        task_queue="default",
        activities=[a],
        workflows=[Workflow],
    )
    await w.run()

if __name__ == "__main__":
    asyncio.run(main())

Output:

2024-01-10T14:27:15.726021Z  WARN temporal_sdk_core::worker::activities::activity_task_poller_stream: Error while polling for activity tasks error=Status { code: PermissionDenied, message: "Request unauthorized.", details: b"\x08\x07\x12\x15Request unauthorized.\x1aJ\nHtype.googleapis.com/temporal.api.errordetails.v1.PermissionDeniedFailure", metadata: MetadataMap { headers: {"content-type": "application/grpc"} }, source: None }
2024-01-10T14:27:15.726020Z  WARN temporal_sdk_core::worker::workflow::wft_poller: Error while polling for workflow tasks error=Status { code: PermissionDenied, message: "Request unauthorized.", details: b"\x08\x07\x12\x15Request unauthorized.\x1aJ\nHtype.googleapis.com/temporal.api.errordetails.v1.PermissionDeniedFailure", metadata: MetadataMap { headers: {"content-type": "application/grpc"} }, source: None }
2024-01-10T14:27:15.726316Z  WARN temporal_sdk_core::worker::workflow::wft_poller: Error while polling for workflow tasks error=Status { code: PermissionDenied, message: "Request unauthorized.", details: b"\x08\x07\x12\x15Request unauthorized.\x1aJ\nHtype.googleapis.com/temporal.api.errordetails.v1.PermissionDeniedFailure", metadata: MetadataMap { headers: {"content-type": "application/grpc"} }, source: None }
2024-01-10T14:27:15.726543Z  WARN temporal_sdk_core::worker::workflow::wft_poller: Error while polling for workflow tasks error=Status { code: PermissionDenied, message: "Request unauthorized.", details: b"\x08\x07\x12\x15Request unauthorized.\x1aJ\nHtype.googleapis.com/temporal.api.errordetails.v1.PermissionDeniedFailure", metadata: MetadataMap { headers: {"content-type": "application/grpc"} }, source: None }
2024-01-10T14:27:15.726584Z  WARN temporal_sdk_core::worker::workflow::wft_poller: Error while polling for workflow tasks error=Status { code: PermissionDenied, message: "Request unauthorized.", details: b"\x08\x07\x12\x15Request unauthorized.\x1aJ\nHtype.googleapis.com/temporal.api.errordetails.v1.PermissionDeniedFailure", metadata: MetadataMap { headers: {"content-type": "application/grpc"} }, source: None }
2024-01-10T14:27:15.727412Z  WARN temporal_sdk_core::worker::workflow::wft_poller: Error while polling for workflow tasks error=Status { code: PermissionDenied, message: "Request unauthorized.", details: b"\x08\x07\x12\x15Request unauthorized.\x1aJ\nHtype.googleapis.com/temporal.api.errordetails.v1.PermissionDeniedFailure", metadata: MetadataMap { headers: {"content-type": "application/grpc"} }, source: None }
2024-01-10T14:27:15.727472Z  WARN temporal_sdk_core::worker::workflow::workflow_stream: WFT poller errored, shutting down
2024-01-10T14:27:15.727490Z ERROR temporal_sdk_core::worker::workflow::workflow_stream: Workflow processing encountered fatal error and must shut down TonicError(Status { code: PermissionDenied, message: "Request unauthorized.", details: b"\x08\x07\x12\x15Request unauthorized.\x1aJ\nHtype.googleapis.com/temporal.api.errordetails.v1.PermissionDeniedFailure", metadata: MetadataMap { headers: {"content-type": "application/grpc"} }, source: None })
2024-01-10T14:27:15.728052Z  WARN temporal_sdk_core::worker::workflow::workflow_stream: WFT poller errored, shutting down
2024-01-10T14:27:15.728070Z ERROR temporal_sdk_core::worker::workflow::workflow_stream: Workflow processing encountered fatal error and must shut down TonicError(Status { code: PermissionDenied, message: "Request unauthorized.", details: b"\x08\x07\x12\x15Request unauthorized.\x1aJ\nHtype.googleapis.com/temporal.api.errordetails.v1.PermissionDeniedFailure", metadata: MetadataMap { headers: {"content-type": "application/grpc"} }, source: None })
2024-01-10T14:27:15.728168Z  WARN temporal_sdk_core::worker::workflow::workflow_stream: WFT poller errored, shutting down
2024-01-10T14:27:15.728475Z ERROR temporal_sdk_core::worker::workflow::workflow_stream: Workflow processing encountered fatal error and must shut down TonicError(Status { code: PermissionDenied, message: "Request unauthorized.", details: b"\x08\x07\x12\x15Request unauthorized.\x1aJ\nHtype.googleapis.com/temporal.api.errordetails.v1.PermissionDeniedFailure", metadata: MetadataMap { headers: {"content-type": "application/grpc"} }, source: None })
2024-01-10T14:27:15.728506Z  WARN temporal_sdk_core::worker::workflow::workflow_stream: WFT poller errored, shutting down
2024-01-10T14:27:15.728512Z ERROR temporal_sdk_core::worker::workflow::workflow_stream: Workflow processing encountered fatal error and must shut down TonicError(Status { code: PermissionDenied, message: "Request unauthorized.", details: b"\x08\x07\x12\x15Request unauthorized.\x1aJ\nHtype.googleapis.com/temporal.api.errordetails.v1.PermissionDeniedFailure", metadata: MetadataMap { headers: {"content-type": "application/grpc"} }, source: None })
2024-01-10T14:27:15.728945Z  WARN temporal_sdk_core::worker::workflow::workflow_stream: WFT poller errored, shutting down
2024-01-10T14:27:15.728958Z ERROR temporal_sdk_core::worker::workflow::workflow_stream: Workflow processing encountered fatal error and must shut down TonicError(Status { code: PermissionDenied, message: "Request unauthorized.", details: b"\x08\x07\x12\x15Request unauthorized.\x1aJ\nHtype.googleapis.com/temporal.api.errordetails.v1.PermissionDeniedFailure", metadata: MetadataMap { headers: {"content-type": "application/grpc"} }, source: None })
Worker failed, shutting down
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/temporalio/worker/_activity.py", line 160, in run
    task = await poll_task
           ^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/temporalio/bridge/worker.py", line 98, in poll_activity_task
    await self._ref.poll_activity_task()
RuntimeError: Poll failure: Unhandled grpc error when activity polling: Status { code: PermissionDenied, message: "Request unauthorized.", details: b"\x08\x07\x12\x15Request unauthorized.\x1aJ\nHtype.googleapis.com/temporal.api.errordetails.v1.PermissionDeniedFailure", metadata: MetadataMap { headers: {"content-type": "application/grpc"} }, source: None }

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/temporalio/worker/_activity.py", line 180, in run
    raise RuntimeError("Activity worker failed") from err
RuntimeError: Activity worker failed
2024-01-10T14:27:15.729754Z  WARN temporal_sdk_core::worker::activities::activity_task_poller_stream: Error while polling for activity tasks error=Status { code: PermissionDenied, message: "Request unauthorized.", details: b"\x08\x07\x12\x15Request unauthorized.\x1aJ\nHtype.googleapis.com/temporal.api.errordetails.v1.PermissionDeniedFailure", metadata: MetadataMap { headers: {"content-type": "application/grpc"} }, source: None }

Then, the process just hangs forever.

Environment/Versions

cretz commented 10 months ago

Thanks for the report! The worker shutdown should succeed and the exception be thrown out of run() if those are the actual workflow and activities you're running. But if it's a more advanced activity that is not respecting cancellation, it's possible worker shutdown is not completing because it is cancelling activities but they are not respecting cancellation in some way. I will test with your above scenario and confirm.

ndtretyak commented 10 months ago

I don't actually run the activities. The authorization token is incorrect, so the worker cannot retrieve any tasks from the server.

cretz commented 10 months ago

I have replicated this with your code. I think this is actually in our Rust Core layer, so I have opened https://github.com/temporalio/sdk-core/issues/667.

milani commented 4 months ago

@cretz the ticket in rust sdk is resolved. does it mean this ticket is resolved and can be closed?

cretz commented 4 months ago

Yes, closing. This has not yet been made available in a released Python SDK version, but we hope to do another release soon.