temporalio / sdk-java

Temporal Java SDK
https://temporal.io
Apache License 2.0
200 stars 134 forks source link

Scheduled workflow retries do not work #2100

Closed mikita-charnukhin closed 6 days ago

mikita-charnukhin commented 1 month ago

Issue description

I am trying to configure a retry policy for a scheduled workflow. By scheduled workflow I mean this feature: Features - Java SDK feature guide | Temporal Documentation 1 . Unfortunately it seems not working for me. I am expecting that when I add a RetryOptions object to the ScheduleScheduleActionStartWorkflowWorkflowOptions then retries will be working according to the provided configuration. I haven’t seen any notes about additional actions which might be required to enable retries for scheduled workflow executions. But maybe I am missing something..

Notes

  1. I am using Temporal v1.23.1.
  2. Activity retries are working as expected for scheduled workflows.
  3. When I fail the activity by throwing a RuntimeException then after activity retries exhausted the scheduled workflow execution fails without expected retries (following any configured retry policy).
  4. I noticed that a retry policy which I set for a scheduled workflow is not visible for a workflow in the Temporal UI as usual (in Workflows → choose Workflow → Event History → WorkflowExecutionStarted event → Retry Policy).
  5. The same RetryOptions works as expected if I use them to trigger a single workflow execution (without a schedule). Example of a single workflow execution which I meant: workflowClient.newWorkflowStub(SomeWorkflow.class, workflowOptions).runSomeWorkflow(someInput);
  6. If I throw a RuntimeException not on the activity level but in the workflow implementation code then a scheduled workflow retries with a some weird retry policy (not following the rules which I set). For example: backoffCoefficient = 1.9; initialInterval = 5m; maxAttempts = 0; maxInterval = 3h. So I am expecting the following retry sequence: 5m → 9.5m → 18.05m → 34.295m → 65.1605 → … → 180m . But the actual retry sequence is: 0.037s → 0.058s → 10s → 10s → 14s → 20s → 27s → 47s → 78s.

Code example

    final var retryOptions = RetryOptions.newBuilder()
        .setMaximumAttempts(0)
        .setInitialInterval(Duration.ofMinutes(5))
        .setBackoffCoefficient(1.9)
        .setMaximumInterval(Duration.ofHours(3))
        .build();
    final var workflowOptions = WorkflowOptions.newBuilder()
        .setRetryOptions(retryOptions)
        .setWorkflowId(workflowIdGenerator.getScheduledWorkflowId(scheduleId))
        .setTaskQueue(TASK_QUEUE_NAME)
        .build();
    final var scheduleAction = ScheduleActionStartWorkflow.newBuilder()
            .setWorkflowType(SomeWorkflow.class)
            .setArguments(input)
            .setOptions(workflowOptions)
            .build();
    final var schedulePolicy = SchedulePolicy.newBuilder().build();
    final var scheduleSpec = ScheduleSpec.newBuilder()
        .setTimeZoneName(timezone)
        .setCronExpressions(cronSchedule)
        .build();
    final var schedule = Schedule.newBuilder()
        .setAction(scheduleAction)
        .setPolicy(schedulePolicy)
        .setSpec(scheduleSpec)
        .build();
    final var scheduleOptions = ScheduleOptions.newBuilder().build();
    scheduleClient.createSchedule(scheduleId, schedule, scheduleOptions);

Is there any issue with scheduled workflows which breaks the retry mechanism? Or maybe you can see an obvious mistake in my configuration..

Thank you in advance!

glebremniov commented 1 month ago

Yeah, upvoting this 👍🏻

aminmansour commented 1 month ago

Seems like a basic enough feature to have for schedules

alexpetroaica commented 1 month ago

Also facing this, even the most basic example seems to completely ignore the retryOptions set on a workflow when it is part of a schedule

dnr commented 3 weeks ago

I tried reproducing this with a schedule client in the Go SDK and retries worked correctly. There might be an issue with Java SDK.

mikita-charnukhin commented 2 weeks ago

Hi @dnr Yes, as I mentioned in the issue description above the retries issue exists in Java SDK. If you go to ScheduleClientImpl.createSchedule(...) -> ScheduleClientCallsInterceptor.createSchedule(...) -> RootScheduleClientInvoker.createSchedule(...) -> ScheduleProtoUtil.scheduleToProto(...) -> ScheduleProtoUtil.actionToProto(...) then you’ll see that RetryOptions object is ignored when doing mapping of ScheduleAction object from the “temporal-sdk-java” module to the ScheduleAction object from the “temporal-serviceclient” module. During that mapping ScheduleAction.action.retryPolicy stays null (action_ is of type NewWorkflowExecutionInfo). RetryOptions object which we pass to the Temporal client is being ignored.

Quinn-With-Two-Ns commented 6 days ago

Resolved with https://github.com/temporalio/sdk-java/pull/2082 which is included in Java SDK v1.24.0