temporalio / sdk-core

Core Temporal SDK that can be used as a base for language specific Temporal SDKs
MIT License
266 stars 70 forks source link

rust-sdk: prevent panic on non-existing workflow #784

Closed valkum closed 1 month ago

valkum commented 1 month ago

What was changed

This changes the check to warn about the workflow and allows the worker to continue polling from the queue similar to the other SDKs.

Currently, there is no good way of writing a test to make sure this isn't introduced by a refactor in the future. Maybe with a custom tracing subscriber to check for the warn output, but that would be something for it's own PR.

Why?

Previously, the worker would panic due to the error return value of the activation handler. Other SDKs do not panic/exit upon an unknown workflow.

Checklist

  1. Closes #558

  2. How was this tested: Start an empty worker with a local dev instance and start a workflow with any name. The worker should not exit.

  3. Any docs updates needed? Not really.

Sushisource commented 1 month ago

@valkum As far as testing goes, you could test this by making sure that the history in an integration test contains a workflow task failure. But, we don't have to block the PR on that so I've set it to merge. You're welcome to add a test in another PR if you like.

valkum commented 1 month ago

@valkum As far as testing goes, you could test this by making sure that the history in an integration test contains a workflow task failure. But, we don't have to block the PR on that so I've set it to merge. You're welcome to add a test in another PR if you like.

Hmm. Are you sure that a workflow failure is recorded in this case? I had a look through the go-sdk but couldn't find anything in that regard.

Sushisource commented 1 month ago

@valkum As far as testing goes, you could test this by making sure that the history in an integration test contains a workflow task failure. But, we don't have to block the PR on that so I've set it to merge. You're welcome to add a test in another PR if you like.

Hmm. Are you sure that a workflow failure is recorded in this case? I had a look through the go-sdk but couldn't find anything in that regard.

Yep, that's definitely what should happen. In Go that error eventually bubbles up from here https://github.com/temporalio/sdk-go/blob/2baa60eb75c4e35cdc027d17552846807f9b9f50/internal/internal_event_handlers.go#L1426.

valkum commented 1 month ago

Following that code path up, it should be this one, right? https://github.com/temporalio/sdk-go/blob/2baa60eb75c4e35cdc027d17552846807f9b9f50/internal/internal_task_pollers.go#L402

Not sure, how much of that machinery is already implemented in the Rust SDK. Will have a look and maybe come back with a PR.

Sushisource commented 1 month ago

Following that code path up, it should be this one, right? https://github.com/temporalio/sdk-go/blob/2baa60eb75c4e35cdc027d17552846807f9b9f50/internal/internal_task_pollers.go#L402

Not sure, how much of that machinery is already implemented in the Rust SDK. Will have a look and maybe come back with a PR.

The machinery for failing workflow tasks is there, it's just not quite hooked up to this particular spot. Probably it would mostly be a matter of just sending a failure down the completions_tx channel in the same spot where you made the change with a detail that the workflow function wasn't registered.