Closed EronWright closed 8 years ago
@EronWright I have finished my other work and I'm returning to the Serialization work now. I will try and get this done in the next week so we can move on...
Update: thanks to @cogmission, work is proceeding on supporting serialization of the network object (for checkpointing purposes) and of the inference object (as output of the HTM operator).
With #15, checkpointing seems to work. Some additional tests will be written before this issue is closed.
Great job, you two!
This is done.
Flink's reliability is based on checkpointing of operator state. In flink-htm the critical state is that of the HTM network.
The operator code is well-prepared to support checkpointing. A state holder is allocated from Flink (see L63 of KeyedHTMInferenceOperator) into which the network instance is stored. Flink expects to use the serializer passed to it, and will automatically serialize the network at the appropriate time. The issue is that the network is not actually serializable (as of htm.java 0.6.5), and the job will crash if checkpointing is enabled.
Stretch goal: Flink supports both synchronous and asynchronous checkpointing. In the latter case, the operator is free to process elements while the state is being checkpointed. Depending on how the network state is ultimately checkpointed, consider it.