uber-go / cadence-client

Framework for authoring workflows and activities running on top of the Cadence orchestration engine.
https://cadenceworkflow.io
MIT License
339 stars 128 forks source link

Honor non-determinism fail workflow policy #1287

Closed taylanisikdemir closed 7 months ago

taylanisikdemir commented 8 months ago

What changed? Users can specify NonDeterministicWorkflowPolicy in worker options. If the FailWorkflow policy is chosen the workflow is expected to terminate as soon as it ends up with a nondeterministic state (e.g. activity order changed). However this wasn't honored for a category of nondeterminism cases. This PR addresses it and workflows fail once any nondeterminism scenario is encountered.

There are two categories of nondeterminism cases in terms of how they get detected by client library:

  1. Issue bubbles up as illegal state panic to the task handler. Most actual prod cases.
  2. Issue is caught when comparing replay decisions with history. Replay test scenarios and a subset of prod cases.

FailWorkflow policy was honored for 2 but not for 1.

Why? To make NonDeterministicWorkflowPolicy feature correct/complete.

How did you test it? Added an integration test to simulate this scenario.

Potential risks Users depending on existing buggy behavior can be impacted. This would only happen if and only if all the below holds true

This is not very realistic expectation because users don't know about these subcategories of nondeterminism detection mechanisms. So the risk of this fix should be minimal.