temporalio / sdk-python

Temporal Python SDK
MIT License
474 stars 77 forks source link

[Bug] Exception raise from workflow causes indefinite retry #432

Closed Holi0317 closed 1 year ago

Holi0317 commented 1 year ago

What are you really trying to do?

I am trying to raise an exception inside workflow which should cause the workflow to fail.

Describe the bug

In the official concept documentation, the expected default behavior of retry on workflow is:

Workflow Execution: When a Workflow Execution is spawned, it is not associated with a default Retry Policy and thus does not retry by default.

https://docs.temporal.io/retry-policies#default-behavior

I try to raise an RuntimeError, or actually just a ValueError will cause this problem where python sdk raise "Failed activation on workflow" and retries the workflow indefinitely.

Minimal Reproduction

Python ```python import asyncio import logging from temporalio import workflow from temporalio.client import Client, WorkflowFailureError from temporalio.worker import Worker @workflow.defn class GreetingWorkflow: @workflow.run async def run(self, name: str) -> str: raise RuntimeError("Goodbye world") return "" async def main(): # Start client client = await Client.connect("localhost:7233") # Run a worker for the workflow async with Worker( client, task_queue="hello-exception-task-queue", workflows=[GreetingWorkflow], ): # While the worker is running, use the client to run the workflow and # print out its result. Note, in many production setups, the client # would be in a completely separate process from the worker. # # This will raise a WorkflowFailureError with cause ActivityError with # cause ApplicationError with the error message and stack trace. try: await client.execute_workflow( GreetingWorkflow.run, "World", id="hello-exception-workflow-id", task_queue="hello-exception-task-queue", ) except WorkflowFailureError: # Log the exception logging.exception("Got workflow failure") if __name__ == "__main__": asyncio.run(main()) ```

It fails with following stacktrace:

Failed activation on workflow GreetingWorkflow with ID hello-exception-workflow-id and run ID 0d6c8b96-0db1-4652-a19a-d11055526325
Traceback (most recent call last):
  File "/Users/holliswu/Library/Caches/pypoetry/virtualenvs/example-5pZqKdJX-py3.12/lib/python3.12/site-packages/temporalio/worker/_workflow_instance.py", line 315, in activate
    self._run_once(check_conditions=index == 1 or index == 2)
  File "/Users/holliswu/Library/Caches/pypoetry/virtualenvs/example-5pZqKdJX-py3.12/lib/python3.12/site-packages/temporalio/worker/_workflow_instance.py", line 1617, in _run_once
    raise self._current_activation_error
  File "/Users/holliswu/Library/Caches/pypoetry/virtualenvs/example-5pZqKdJX-py3.12/lib/python3.12/site-packages/temporalio/worker/_workflow_instance.py", line 1635, in _run_top_level_workflow_function
    await coro
  File "/Users/holliswu/Library/Caches/pypoetry/virtualenvs/example-5pZqKdJX-py3.12/lib/python3.12/site-packages/temporalio/worker/_workflow_instance.py", line 766, in run_workflow
    result = await self._inbound.execute_workflow(input)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/holliswu/Library/Caches/pypoetry/virtualenvs/example-5pZqKdJX-py3.12/lib/python3.12/site-packages/temporalio/worker/_workflow_instance.py", line 1888, in execute_workflow
    return await input.run_fn(*args)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Volumes/Projects/example/main.py", line 13, in run
    raise RuntimeError("Goodbye world")
RuntimeError: Goodbye world
2023-11-25T03:38:06.936582Z  WARN temporal_sdk_core::worker::workflow: Failing workflow task run_id=0d6c8b96-0db1-4652-a19a-d11055526325 failure=Failure { failure: Some(Failure { message: "Goodbye world", source: "", stack_trace: "  File \"/Users/holliswu/Library/Caches/pypoetry/virtualenvs/example-5pZqKdJX-py3.12/lib/python3.12/site-packages/temporalio/worker/_workflow_instance.py\", line 315, in activate\n    self._run_once(check_conditions=index == 1 or index == 2)\n\n  File \"/Users/holliswu/Library/Caches/pypoetry/virtualenvs/example-5pZqKdJX-py3.12/lib/python3.12/site-packages/temporalio/worker/_workflow_instance.py\", line 1617, in _run_once\n    raise self._current_activation_error\n\n  File \"/Users/holliswu/Library/Caches/pypoetry/virtualenvs/example-5pZqKdJX-py3.12/lib/python3.12/site-packages/temporalio/worker/_workflow_instance.py\", line 1635, in _run_top_level_workflow_function\n    await coro\n\n  File \"/Users/holliswu/Library/Caches/pypoetry/virtualenvs/example-5pZqKdJX-py3.12/lib/python3.12/site-packages/temporalio/worker/_workflow_instance.py\", line 766, in run_workflow\n    result = await self._inbound.execute_workflow(input)\n             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n  File \"/Users/holliswu/Library/Caches/pypoetry/virtualenvs/example-5pZqKdJX-py3.12/lib/python3.12/site-packages/temporalio/worker/_workflow_instance.py\", line 1888, in execute_workflow\n    return await input.run_fn(*args)\n           ^^^^^^^^^^^^^^^^^^^^^^^^^\n\n  File \"/Volumes/Projects/example/main.py\", line 13, in run\n    raise RuntimeError(\"Goodbye world\")\n", encoded_attributes: None, cause: None, failure_info: Some(ApplicationFailureInfo(ApplicationFailureInfo { r#type: "RuntimeError", non_retryable: false, details: None })) }), force_cause: Unspecified }
Failed activation on workflow GreetingWorkflow with ID hello-exception-workflow-id and run ID 0d6c8b96-0db1-4652-a19a-d11055526325
Traceback (most recent call last):
  File "/Users/holliswu/Library/Caches/pypoetry/virtualenvs/example-5pZqKdJX-py3.12/lib/python3.12/site-packages/temporalio/worker/_workflow_instance.py", line 315, in activate
    self._run_once(check_conditions=index == 1 or index == 2)
  File "/Users/holliswu/Library/Caches/pypoetry/virtualenvs/example-5pZqKdJX-py3.12/lib/python3.12/site-packages/temporalio/worker/_workflow_instance.py", line 1617, in _run_once
    raise self._current_activation_error
  File "/Users/holliswu/Library/Caches/pypoetry/virtualenvs/example-5pZqKdJX-py3.12/lib/python3.12/site-packages/temporalio/worker/_workflow_instance.py", line 1635, in _run_top_level_workflow_function
    await coro
  File "/Users/holliswu/Library/Caches/pypoetry/virtualenvs/example-5pZqKdJX-py3.12/lib/python3.12/site-packages/temporalio/worker/_workflow_instance.py", line 766, in run_workflow
    result = await self._inbound.execute_workflow(input)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/holliswu/Library/Caches/pypoetry/virtualenvs/example-5pZqKdJX-py3.12/lib/python3.12/site-packages/temporalio/worker/_workflow_instance.py", line 1888, in execute_workflow
    return await input.run_fn(*args)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Volumes/Projects/example/main.py", line 13, in run
    raise RuntimeError("Goodbye world")
RuntimeError: Goodbye world

And temporal UI showing the workflow is "running", not "Failed"

Ignore the "No Workers Running" warning. I took this screenshot in last minute, after killing the python worker.

image

I did try to replicate this issue with go and the temporal workflow stopped correctly there:

Go implementation ```go package main import ( "context" "fmt" "log" "time" "go.temporal.io/sdk/client" "go.temporal.io/sdk/worker" "go.temporal.io/sdk/workflow" ) // Workflow is a Hello World workflow definition. func Workflow(ctx workflow.Context, name string) (string, error) { ao := workflow.ActivityOptions{ StartToCloseTimeout: 10 * time.Second, } ctx = workflow.WithActivityOptions(ctx, ao) logger := workflow.GetLogger(ctx) logger.Info("HelloWorld workflow started", "name", name) return "", fmt.Errorf("Goodbye world") } func workerMain() { // The client and worker are heavyweight objects that should be created once per process. c, err := client.Dial(client.Options{}) if err != nil { log.Fatalln("Unable to create client", err) } defer c.Close() w := worker.New(c, "hello-world", worker.Options{}) w.RegisterWorkflow(Workflow) err = w.Run(worker.InterruptCh()) if err != nil { log.Fatalln("Unable to start worker", err) } } func main() { go workerMain() // The client is a heavyweight object that should be created once per process. c, err := client.Dial(client.Options{}) if err != nil { log.Fatalln("Unable to create client", err) } defer c.Close() workflowOptions := client.StartWorkflowOptions{ ID: "hello_world_workflowID", TaskQueue: "hello-world", } we, err := c.ExecuteWorkflow(context.Background(), workflowOptions, Workflow, "Temporal") if err != nil { log.Fatalln("Unable to execute workflow", err) } log.Println("Started workflow", "WorkflowID", we.GetID(), "RunID", we.GetRunID()) // Synchronously wait for the workflow completion. var result string err = we.Get(context.Background(), &result) if err != nil { log.Fatalln("Unable get workflow result", err) } log.Println("Workflow result:", result) } ```

Output of the go version is:

2023/11/24 22:45:19 INFO  No logger configured for temporal client. Created default one.
2023/11/24 22:45:19 INFO  No logger configured for temporal client. Created default one.
2023/11/24 22:45:19 Started workflow WorkflowID hello_world_workflowID RunID ae2b5d3e-ff8a-4183-a717-575efbb240e0
2023/11/24 22:45:19 INFO  Started Worker Namespace default TaskQueue hello-world WorkerID 15651@Holliss-MacBook-Pro.local@
2023/11/24 22:45:19 INFO  HelloWorld workflow started Namespace default TaskQueue hello-world WorkerID 15651@Holliss-MacBook-Pro.local@ WorkflowType Workflow WorkflowID hello_world_workflowID RunID ae2b5d3e-ff8a-4183-a717-575efbb240e0 Attempt 1 name Temporal
2023/11/24 22:45:19 Unable get workflow result workflow execution error (type: Workflow, workflowID: hello_world_workflowID, runID: ae2b5d3e-ff8a-4183-a717-575efbb240e0): Goodbye world
exit status 1

Environment/Versions

Additional context

I did found https://github.com/temporalio/sdk-python/pull/329 which seems related to this issue.

Quinn-With-Two-Ns commented 1 year ago

This is expected, in all exception based SDKs (like python) only Temporal Errors fail a Workflow Execution by default, other exceptions will retry the Workflow Task. You can use ApplicationError in the Python SDK to fail the workflow.

This is also in our documentation https://docs.temporal.io/references/failures#throw-from-workflows

cretz commented 1 year ago

:+1: To what @Quinn-With-Two-Ns said. Closing, but feel free to continue conversation here or on #python-sdk Slack or in the forums.