Closed assafz-q closed 3 months ago
I would expect the workflow to fail because of the exception (and log it in the history) before the timeout occurs, in the same way it happens if exception is thrown in the middle of the workflow execution.
Common exceptions do not fail the workflow by default, they "suspend" it. This is the same way as if an exception is thrown in the middle of workflow execution. See https://github.com/temporalio/sdk-python?tab=readme-ov-file#exceptions. You can customize this behavior and/or customize the converter to throw certain exceptions.
I would expect the workflow to fail because of the exception (and log it in the history) before the timeout occurs, in the same way it happens if exception is thrown in the middle of the workflow execution.
Common exceptions do not fail the workflow by default, they "suspend" it. This is the same way as if an exception is thrown in the middle of workflow execution. See https://github.com/temporalio/sdk-python?tab=readme-ov-file#exceptions. You can customize this behavior and/or customize the converter to throw certain exceptions.
I know, that is why I catch the exception and raise it again as non-retriable ApplicationError
.
Right, so the exception raised by the data converter is similar to raising the exception in the workflow, but it occurs on a different path (it occurs in the payload converter not the workflow code). It's a bit advanced, but you can customize the converter the same way you customize the workflow to catch and re-raise in a certain way. Alternatively to both, you can just put failure_exception_types=[Exception]
or similar in the @workflow.defn
and it should fail the workflow with any exception instead of failing the task (i.e. suspending pending code fix).
Right, so the exception raised by the data converter is similar to raising the exception in the workflow, but it occurs on a different path (it occurs in the payload converter not the workflow code). It's a bit advanced, but you can customize the converter the same way you customize the workflow to catch and re-raise in a certain way. Alternatively to both, you can just put
failure_exception_types=[Exception]
or similar in the@workflow.defn
and it should fail the workflow with any exception instead of failing the task (i.e. suspending pending code fix).
Thanks! But I am able to catch the exception in the try except of the workflow so I think everything is as I expected. I think the problem is related to the warning log I attached to the issue:
WARN temporal_sdk_core::worker::workflow: Error while completing workflow activation error=status: InvalidArgument, message: "invalid TaskQueue on ScheduleActivityTaskCommand: missing task queue name. ActivityId=1 ActivityType=some_activity", details: [], metadata: MetadataMap { headers: {"content-type": "application/grpc"} }
I was able to (maybe) fix that locally if in (temporalio.worker._workflow_instance._ActivityHandle._apply_schedule_command
)
https://github.com/temporalio/sdk-python/blob/365ceadb0732f36b66bf1ae2970131fc7944b565/temporalio/worker/_workflow_instance.py#L2145-L2215
I moved the part that handles the payloads:
https://github.com/temporalio/sdk-python/blob/365ceadb0732f36b66bf1ae2970131fc7944b565/temporalio/worker/_workflow_instance.py#L2167-L2170
to the end of the function (_apply_schedule_command)
I am not too familiar with the code, but it seems like in that context the queue name is only initialized after the payload conversion, so if I had an exception during the payload conversion it could not be sent to the temporal server.
I was able to (maybe) fix that locally if in (
temporalio.worker._workflow_instance._ActivityHandle._apply_schedule_command
) to the end of the function (_apply_schedule_command
)
I see the issue. The issue is that we create the command before we try to serialize the contents. So if this does not fail the workflow task, the command is sent off incomplete (because it threw an exception during building). I have opened https://github.com/temporalio/sdk-python/issues/564 to track (it is easier as a separate issue to state the problem clearly), but will keep this issue open.
What are you really trying to do?
I am writing a temporal workflow and I want to handle the exceptions that happen during the execution, I want the workflow to catch any unexpected error and fail the entire workflow in a non-retriable way.
Describe the bug
I am catching all the exceptions during workflow run and re-raise them as non-retriable ApplicationErrors and everything works as expected except for the case where the exception is thrown during the encoding of data to an activity.
Running a workflow that sends unserializable obj to an activity results in the following history (which doesn't contain any failure in it except for the timeout [2s]):
and the following warning in the logs:
I would expect the workflow to fail because of the exception (and log it in the history) before the timeout occurs, in the same way it happens if exception is thrown in the middle of the workflow execution.
Minimal Reproduction
Environment/Versions
temporal --version
)temporal version 0.12.0 (server 1.23.0) (ui 2.26.2)