Closed EronWright closed 5 years ago
We need a workaround for Flink 1.4 and a true fix for Flink 1.5. Of course I'll file a ticket for the latter with some concrete ideas.
One idea is to add an initializeState
method to MasterTriggerRestoreHook
, which would be called unconditionally (on initial execution and on recovery, with or without checkpoint state). The Pravega connector would initialize or forcibly reinitialize the reader group state here. A subsequent call to restoreCheckpoint
would restore reader group checkpoint state (as it does now). A nice side-benefit of this approach is to move reader group initialization from client to JM.
Update: opened FLINK-8533.
Update 2: patch submitted.
The patch was merged for Flink 1.5. Once 1.5 is out, we will update the connector to use the new functionality. The tricky part is whether to continue to support 1.4.
Basically the ReaderCheckpointHook
class will have the following new code:
// lifecycle
@Override
public void reset() {
// reset the reader group to its initial condition
log.debug("Resetting the state of reader group {} to its initial state.", readerGroup.getGroupName());
readerGroup.resetReaderGroup(readerGroupConfig);
}
@Override
public void close() {
readerGroup.close();
}
Problem description In the edge case where a task fails before the first checkpoint is successful, the behavior should be that the group is rewound to the initial state. For example, if the group was configured to start from the beginning of the stream, it should restart from the beginning.
Due to how the source is implemented by creating the reader group in the source constructor (which is not re-executed in this case), and that the hook isn't invoked when there's no state to restore, the actual behavior is that the group simply continues from where it left off. Actually, when the replacement tasks start up, an error occurs due to dirty state:
Simply removing oneself from the online readers would fix the above symptom but would produce undesirable at-most-once behavior.
Suggestions for an improvement The obvious fix is to change Flink's hook functionality to invoke the hook in the non-restore case too. In that case, the hook would reinitialize the group. Alternately, the tasks could catch the above exception and reset the reader group state, with some additional coordination.
As a workaround, the reader could wait for the first Flink checkpoint to arrive before processing any elements. There's a catch-22: the Flink checkpoints are communicated to the task via the reader group state!