Closed tminusplus closed 1 year ago
The title says:
Specifying StartToCloseTimeout but no ScheduleToStartTimeout or ScheduleToCloseTimeout does not return an error
It is possible that this behavior is expected, and just not clearly defined within the docs.
Definitely expected behavior (and is expected across all SDKs, as it is enforced server side). Simply: At least one of ScheduleToCloseTimeout
or StartToCloseTimeout
is required. From the Godoc on ActivityOptions
you linked (other stuff left off for clarity):
// Either this option or StartToClose is required: Defaults to unlimited.
ScheduleToCloseTimeout time.Duration
// If ScheduleToClose is not provided then this timeout is required: Defaults to the ScheduleToCloseTimeout value.
StartToCloseTimeout time.Duration
This is also documented at https://docs.temporal.io/docs/go/how-to-set-activityoptions-in-go. We would be happy to make this clearer in any way you can think of. Suggestions?
That makes sense, thank you for the clarification. The part that was confusing was these three comments from the godoc:
// ...
// Either this option or StartToClose is required: Defaults to unlimited.
ScheduleToCloseTimeout time.Duration
// ...
// If ScheduleToClose is not provided then this timeout is required.
// Optional: Defaults to unlimited.
ScheduleToStartTimeout time.Duration
// ...
// If ScheduleToClose is not provided then this timeout is required: Defaults to the ScheduleToCloseTimeout value.
StartToCloseTimeout time.Duration
It gave me the impression that if ScheduleToClose is not provided then both StartToClose and ScheduleToStart are required. Perhaps a potential improvement might be:
// ...
// If StartToClose is not provided then this timeout or ScheduleToStart are required.
// Defaults to unlimited.
ScheduleToCloseTimeout time.Duration
// ...
// If StartToClose is not provided then this timeout or ScheduleToClose are required.
// Optional: Defaults to unlimited.
ScheduleToStartTimeout time.Duration
// ...
// If ScheduleToClose or ScheduleToStart are not provided then this timeout is required.
// Defaults to the ScheduleToCloseTimeout value.
StartToCloseTimeout time.Duration
// If StartToClose is not provided then this timeout or ScheduleToStart are required.
// Defaults to unlimited.
ScheduleToCloseTimeout time.Duration
I am not sure this is accurate. You must have at least one of ScheduleToCloseTimeout
or StartToCloseTimeout
timeout set. That requirement is unrelated to the ScheduleToStartTimeout
requirement (at least from my reading, I may have to go confirm behavior).
Expected Behavior
Executing an activity with a defined StartToCloseTimeout, an undefined ScheduleToStartTimeout, and an undefined ScheduleToCloseTimeout raises an error.
Originally, we expected the workflow to continue retrying forever - but closer reading of the docs revealed that you must set either ScheduleToStartTimeout or ScheduleToCloseTimeout here. The doc can be confusing, because you have to read all three timeouts to know what must be set, if I understand it correctly.
It is possible that this behavior is expected, and just not clearly defined within the docs. I tried looking in the Go SDK docs and on temporal.io. Please let me know if that is the case! The fix is easy on our side but I wanted to report a bug as the behaviour surprised us.
Actual Behavior
Executing an activity with a defined StartToCloseTimeout, an undefined ScheduleToStartTimeout, and an undefined ScheduleToCloseTimeout does not raise an error and runs normally. Then if a WorkflowTaskTimedOut occurs, we observed this with a ScheduleToStart timeout type, the workflow task will fail and retryState will be set to
RetryPolicyNotSet
.Steps to Reproduce the Problem
Specifications
Notes
I wanted to provide some more info on this bug, so it is easier to reproduce. I am unable to share the full workflow logs, but can share some of the non-sensitive parts.
Workflow task started with:
Activity task started with:
Activity task failed with this, which is the error I was throwing to test live failures:
Workflow task scheduled as sticky:
Then it timed out:
And finally, it failed:
I tried root causing this, and I think it might be caused around here https://github.com/temporalio/temporal/blob/3de5687db2eda2ad8367beaff4c26e439c84f62c/service/history/commandChecker.go#L306