Closed kanupriya15025 closed 10 months ago
I have a feeling that this is related to this change : https://github.com/kamperiadis/durabletask-java/blob/5b0c54de018c62f1fbb6746ef10c6731073c4148/client/src/main/java/com/microsoft/durabletask/DurableTaskGrpcWorker.java#L26
Adding @kamperiadis for awareness.
Thank you @kanupriya15025 for reporting this issue. Can you please clarify for me what you meant by "the event is read and then discarded." Just making sure that I understand the issue at hand - so you expected the "Patch received before start of schedule." to be shown since the patchEvent completed before the timer but you only saw the log when the timer was less than 3 days. Am I understanding it correctly?
From my own testing, this seems to be happening because the orchestration does not finish starting before the external event is triggered. I was able to reproduce this issue when the timer had relatively large values since that's when I notice the orchestration taking longer to start. However, when I tested with a timer of 7 days, I still got the expected behavior, so I do not think this is related to the changes in #122 (i.e. 3 days did not consistently become the constraint for me). When testing with larger numbers (e.g. 10, 15), I was able to reproduce the issue more consistently.
Hi @kanupriya15025! I was unable to reproduce this issue by following these steps:
This is the sample code I am running
Endpoint that will trigger the orchestration:
@FunctionName("StartOrchestration")
public HttpResponseMessage startOrchestration(
@HttpTrigger(name = "req",
methods = {HttpMethod.GET, HttpMethod.POST},
authLevel = AuthorizationLevel.ANONYMOUS) final HttpRequestMessage<Optional<String>> request,
@DurableClientInput(name = "durableContext") final DurableClientContext durableContext,
final ExecutionContext context) {
context.getLogger().info("Java HTTP trigger processed a request");
final DurableTaskClient client = durableContext.getClient();
final String instanceId = client.scheduleNewOrchestrationInstance("WaitForExternalEvent");
return durableContext.createCheckStatusResponse(request, instanceId);
}
Orchestration logic similar to what you have described:
Task<Something> event = ctx.waitForExternalEvent("Approval", Something.class);
Task<?> timer = ctx.createTimer(Duration.ofDays(7));
Task<?> winner = ctx.anyOf(event, timer).await();
if (winner == event) {
response = event.await();
if (!ctx.getIsReplaying()) {
context.getLogger().info("winner is the external event");
ctx.continueAsNew(response);
return;
}
}
else {
context.getLogger().info("winner is the timer");
}
External event logic:
durableContext.getClient().raiseEvent(instanceId, "Approval", object);
Can you please let me know if these are not the right steps to reproduce the issue? Thank you.
I think the issue comes from here https://github.com/microsoft/durabletask-java/blob/293489f269972678ba0e1bc1ee4cfabd7b8ec5b5/client/src/main/java/com/microsoft/durabletask/TaskOrchestrationExecutor.java#L571
We explicitly await
the internal timer to complete. So when customer have a timer larger than 3 days they will have an internal timer we set for them, and the code is waiting on this timer to complete first. In this case, all the external events are buffered in queue because the internal timer is not completed so no further action is taken on those events. This also explain why when timer smaller than 3 days we don't see this issue.
We may need to find another way to resolve issue rasied at https://github.com/microsoft/durabletask-java/issues/114
cc @cgillum @lilyjma
In this case, I don't think there is a workaround on SDK side, seems we need to update the DF extension accordingly. @cgillum , @davidmrdavid do you have some better ideas? Thanks.
@kaibocai I'm just now looking at the code you linked to, and I think you're correct about what the problem is. However, I disagree about the solution. There's nothing the extension can do to fix this - it merely persists the actions provided to it by the SDK. Rather, the problem in my opinion is that the SDK implementation is incorrect - we should not be blocking the orchestration like we are in the line you pointed to. The createTimer
method needs to be reimplemented in a way that is non-blocking.
@kanupriya15025 We just released SDK v1.4.0, which has the fix for this issue. You can find it here Central Repository: com/microsoft/durabletask-azure-functions/1.4.0 (apache.org) Thank you
@kamperiadis Now the event is read properly but in a code similar to this :
Task<Something> event = ctx.waitForExternalEvent("Approval", Something.class);
Task<?> timer = ctx.createTimer(Duration.ofDays(7));
Task<?> winner = ctx.anyOf(event, timer).await();
if (winner == event) {
response = event.await();
if (!ctx.getIsReplaying()) {
context.getLogger().info("winner is the external event");
ctx.continueAsNew(response);
return;
}
}
else {
context.getLogger().info("winner is the timer");
}
The log line "winner is the external event" keeps on printing infinitely. (Note that it's under a condition, so the log line is not replaying. The event is somehow being replayed again and again with each execution.
I am trying to send an event to my eternal orchestrator function while my function is sleeping with a timer. If the timer is more than 3days, the orchestrator function doesn't read the message and ignores it. Here's my code :
If the duration of the sleep task is less than 3days, it reads the patchEvent and executes the first "if" block. However, if the timer is more than 3days, the event is read and then discarded.
Here are the logs:
[2023-05-23T08:51:09.400Z] 14:21:09.399 [pool-2-thread-8] INFO com.adobe.orchestration.EternalOrchestrator - kanu_java_test_1: startDate is: 2023-05-26T00:00:00.000-05:00 [2023-05-23T08:51:09.400Z] 14:21:09.399 [pool-2-thread-8] INFO com.adobe.orchestration.EternalOrchestrator - kanu_java_test_1: Workflow startDate is in future, Starting a timer for duration: PT68H9M2S seconds [2023-05-23T08:51:09.400Z] 14:21:09.399 [pool-2-thread-8] INFO com.adobe.orchestration.EternalOrchestrator - Patch received before start of schedule. JSON {JSON Body}
[2023-05-23T08:51:18.054Z] Executing 'Functions.EternalOrchestrator' (Reason='(null)', Id=a429e287-8e6b-43a5-8b4e-53189c221bd6) [2023-05-23T08:51:18.060Z] 14:21:18.060 [pool-2-thread-8] INFO com.adobe.orchestration.EternalOrchestrator - kanu_java_test_1: startDate is: 2023-05-27T00:00:00.000-05:00 [2023-05-23T08:51:18.061Z] 14:21:18.060 [pool-2-thread-8] INFO com.adobe.orchestration.EternalOrchestrator - kanu_java_test_1: Workflow startDate is in future, Starting a timer for duration: PT92H8M50S seconds [2023-05-23T08:51:18.061Z] Function "EternalOrchestrator" (Id: a429e287-8e6b-43a5-8b4e-53189c221bd6) invoked by Java Worker [2023-05-23T08:51:18.064Z] Executed 'Functions.EternalOrchestrator' (Succeeded, Id=a429e287-8e6b-43a5-8b4e-53189c221bd6, Duration=12ms)