Closed szaluzhskiy closed 5 years ago
The first trace shows that the workflow is waiting on a child workflow. I'm not sure how it is related to an activity invocation. The failure trace shows that the activity execution is failing due to the exception thrown by the data converter. Do you have a retry policy specified for the activity? The this activity would be retried up to the retry policy expiration. Have you looked at the trace/history of the child workflow that supposedly invoked the failing activity?
Hi @mfateev, we have retry policy and execution timeout set. But the thing is that workflow is set to 'blocked' state when it is clearly failed. So we are basically asking for fail fast option for exceptions on data converter. Here's the minimal reproducible project https://github.com/kalibek/cadence-deserialization-bug
Specify DataConverterExceptions as non retryable in your activity retry options. Otherwise the activity is going to be retried up to the retry policy expiration interval.
I tried your suggestion but this is not the issue. The issue is that the child workflow doesn't respond in this case to the parent one. And while the child workflow is ready to accept new data the parent workflow hangs on waiting response.
The thrown Exception handled in the PollTaskExecutor
with the code snippet:
@Override
public void process(T task) {
taskExecutor.execute(
() -> {
MDC.put(LoggerTag.DOMAIN, domain);
MDC.put(LoggerTag.TASK_LIST, taskList);
try {
handler.handle(task);
} catch (Throwable ee) {
options
.getPollerOptions()
.getUncaughtExceptionHandler()
.uncaughtException(Thread.currentThread(), handler.wrapFailure(task, ee)); <----
} finally {
MDC.remove(LoggerTag.DOMAIN);
MDC.remove(LoggerTag.TASK_LIST);
}
});
}
And that's it. No answer goes to the parent flow. And it waits for timeout expiration.
Even if I set custom exception handler I can do little about it.
I think this is because in com.uber.cadence.internal.sync.POJOActivityTaskHandler.POJOActivityImplementation#execute
the dataConverter is out of the try-catch block:
public ActivityTaskHandler.Result execute(
IWorkflowService service, String domain, ActivityTaskImpl task, Scope metricsScope) {
ActivityExecutionContext context =
new ActivityExecutionContextImpl(service, domain, task, dataConverter, heartbeatExecutor);
byte[] input = task.getInput();
-->Object[] args = dataConverter.fromDataArray(input, method.getGenericParameterTypes());
CurrentActivityExecutionContext.set(context);
-->try {
Object result = method.invoke(activity, args);
RespondActivityTaskCompletedRequest request = new RespondActivityTaskCompletedRequest();
if (context.isDoNotCompleteOnReturn()) {
return new ActivityTaskHandler.Result(null, null, null, null);
}
if (method.getReturnType() != Void.TYPE) {
request.setResult(dataConverter.toData(result));
}
return new ActivityTaskHandler.Result(request, null, null, null);
} catch (RuntimeException | IllegalAccessException e) {
return mapToActivityFailure(task.getActivityType(), e, metricsScope);
} catch (InvocationTargetException e) {
return mapToActivityFailure(task.getActivityType(), e.getTargetException(), metricsScope);
} finally {
CurrentActivityExecutionContext.unset();
}
}
the issue is the same as #374
I see. I've got confused about relationship of activity failure and the child workflow. #374 makes sense to me. Let me fix the issue.
I've faced with strange behaviour. Custom data converter is used. And this converter throws exception when deserializing data between workflow and activity. Then workflow stucked in blocked state and waiting for "nothing".
Exception I see in cadence-web UI
Exception in worker