poanetwork / hbbft

An implementation of the paper "Honey Badger of BFT Protocols" in Rust. This is a modular library of consensus.
Other
357 stars 96 forks source link

Implement a journal, to persist and restore the algorithms' state? #421

Closed afck closed 5 years ago

afck commented 5 years ago

For Parity (https://github.com/poanetwork/parity-ethereum/issues/181) and other applications, we need to make it possible to persist the state of consensus to disk. That allows a node to e.g. restart without becoming faulty.

We will add a journaling feature to hbbft: Each Step returned by the algorithms will contain journal log entries that represent the internal state change. These can be persisted, and even after a crash, the full state can be recovered by replaying the journal. That way, a node can continue to operate without being marked as faulty or skipping an epoch.

To keep the journal from growing indefinitely, the entries' epoch number should be user-visible: The upper layer can always delete all entries from previous epochs.

So let's make all algorithm structs implement the serde traits (Serialize and Deserialize).

afck commented 5 years ago

On the other hand, that makes it easy to mistakenly persist secret keys and key shares, since those are part of the algorithm structs. Also, NetworkInfo is shared between the subalgorithms, and the default Deserialize implementation would create a lot of duplicates.

We should probably encourage the user to store the NetworkInfo — and in particular the secret keys — separately, so they can take measures to protect them.

So we could implement a serializable state representation for each algorithm, which wouldn't include the NetworkInfo.

Alternatively maybe we could add a journal to Step, containing individual state change logs. That would allow persisting the state update in every turn, and restoring the state from a journal even after a crash.

dforsten commented 5 years ago

As discussed privately the "journal" function would be best.

An essential feature for "production-ready" status.

afck commented 5 years ago

Conceptually, the event handler methods (handle_input, handle_message) take:

and they output:

The journal would be added to Step as a sequence of entries that fully define the difference between the old state and the new state.

The naive way of implementing that is not an option: If we try to go through the code and create a journal entry in every one of the hundreds of places where we modify the mutable algorithm state, we'll never get it right, and the code wouldn't be maintainable; every change where we forget to add or update the corresponding journal entry creation would break it. I see three possible ways forward:

1. Only implement checkpoints.

Instead, just make sure that the event handlers are all fully deterministic. In fact, this should already be the case!

So the user can already emulate journaling functionality by:

Then after a crash or restart, the algorithm's state can be restored by replaying all inputs and using the same seeds. The output, outgoing messages and fault logs can all be thrown away, since they have already been handled.

To avoid having to store all of this forever, it would suffice to make the algorithm state serializable. Then the log can periodically be deleted and replaced with the current snapshot.

I suspect that this option would actually not be very wasteful: hbbft's message types don't contain much redundant information, so in normal operation the log might not even require more space than journal entries would. (To avoid storing messages that didn't change the state, we could add some flag to Step.) If that's true, I don't think there is a good justification for the other options:

2. Diff the states after event handling.

All handle_event and handle_message implementations could clone the old state, and, before returning, produce log entries based on the difference between the old and new states. That way, at least we wouldn't have to touch the internal logic too much.

However, it can be wasteful to clone the full state every time. In some places, this could be optimized a bit, e.g. we'd certainly put the HoneyBadger contributions, which can be large, into Rcs instead of cloning them. But there would still be a lot of cloning going on…

3. Encapsulate the algorithm states.

This would be the most invasive change, but it would avoid the cloning:

Separate all the algorithms' states, and make their fields private. They can only be modified by calling state.apply(&journal_entry). That way we are forced to always produce the correct journal entries. We can make them #[must_use], too: They must always be added into the Step.

This would mean almost a complete rewrite, though.

dforsten commented 5 years ago

Interesting approach using deterministic replay of hbbft input to get back to the state before shutdown.

In Parity we already will have to persist outgoing, but unsent, messages - persisting incoming messages as long as a hbbft epoch is active should be similar in implementation.

afck commented 5 years ago

I don't think it makes sense to implement snapshot functionality for all the one-shot subalgorithms. And for HoneyBadger it suffices to make a "snapshot" on each epoch change, which essentially consists of only the epoch number.

So I'm actually not sure we need to add any functionality to hbbft at all. Here's what I think should already work:

afck commented 5 years ago

Please reopen if there's anything we need to add to hbbft to support restarts and crash recovery.