htm-community / flink-htm

Distributed, streaming anomaly detection and prediction with HTM in Apache Flink
GNU Affero General Public License v3.0
136 stars 37 forks source link

Support Flink Checkpointing of HTM State #2

Closed EronWright closed 8 years ago

EronWright commented 8 years ago

Flink's reliability is based on checkpointing of operator state. In flink-htm the critical state is that of the HTM network.

The operator code is well-prepared to support checkpointing. A state holder is allocated from Flink (see L63 of KeyedHTMInferenceOperator) into which the network instance is stored. Flink expects to use the serializer passed to it, and will automatically serialize the network at the appropriate time. The issue is that the network is not actually serializable (as of htm.java 0.6.5), and the job will crash if checkpointing is enabled.

Stretch goal: Flink supports both synchronous and asynchronous checkpointing. In the latter case, the operator is free to process elements while the state is being checkpointed. Depending on how the network state is ultimately checkpointed, consider it.

cogmission commented 8 years ago

@EronWright I have finished my other work and I'm returning to the Serialization work now. I will try and get this done in the next week so we can move on...

EronWright commented 8 years ago

Update: thanks to @cogmission, work is proceeding on supporting serialization of the network object (for checkpointing purposes) and of the inference object (as output of the HTM operator).

EronWright commented 8 years ago

With #15, checkpointing seems to work. Some additional tests will be written before this issue is closed.

rhyolight commented 8 years ago

Great job, you two!

EronWright commented 8 years ago

This is done.