Closed kaibocai closed 1 year ago
shouldn't we also add a test for cases where at least one function call fails and is retried
@cgillum, it turns out the current approach failed on this case. The thing is when customers using anyOf/allOF
for example
tasks.add(ctx.callActivity("Append", "InputSad1", taskOptions, String.class));
tasks.add(ctx.callActivity("Append", "InputSad1", taskOptions, String.class));
tasks.add(ctx.callActivity("Append", "InputSad1", taskOptions, String.class));
return ctx.allOf(tasks).await();
the allOf
here returns a CompletableTask
https://github.com/microsoft/durabletask-java/blob/b919929a0503f24ea5ac7a7e9d593a43a9dbcbdd/client/src/main/java/com/microsoft/durabletask/TaskOrchestrationExecutor.java#L188, which is not retriable and we are awaiting on it.
So all the RetriableTask
tasks we put in the list won't be retried as the task we await
on is a CompletableTask
.
So I am thinking let allOf
return CompletableTask
or RetriableTask
depending on the tasks it has in the collection. However, there is another issue comes, as the collection of tasks can be mixed with CompletableTask
and RetriableTask
. In that case, I cannot run retry logic only on those RetriableTask
in the collection. I can only run (or not run) retry logic on all of the tasks altogether.
But maybe this is a corner case and it doesn't really matter to customers that few of their non-retriable tasks get retried?
To really solve this issue, I think we should have a new GRPC type dedicate for RetriableTask
, so the SDK can decide whether it should retry based on the message sent from the sidecar.
cc: @davidmrdavid
But maybe this is a corner case and it doesn't really matter to customers that few of their non-retriable tasks get retried?
I think this will be problematic in large fan-out cases. If 1/100 activities fails, the customer will expect that we only retry the 1 that failed and not the other 99. I don't see this as a corner case. I think we should take the time necessary to figure it out, though I'm okay with doing a partial fix since the current problem of getting stuck is worse than the problem of too much retrying.
To really solve this issue, I think we should have a new GRPC type dedicate for RetriableTask, so the SDK can decide whether it should retry based on the message sent from the sidecar.
The gRPC layer has no notion of tasks, and this is by design. Tasks are purely an SDK concept, so there isn't any way to model it in the gRPC layer even if we wanted to. I can make time to try and help identify the correct design, but it will probably have to wait until after I get back from my vacation.
Close this PR as a new one is created https://github.com/microsoft/durabletask-java/pull/157
Issue describing the changes in this PR
resolves https://github.com/microsoft/durabletask-java/issues/149
Pull request checklist
CHANGELOG.md
Additional information
Additional PR information