Closed afck closed 5 years ago
On the other hand, that makes it easy to mistakenly persist secret keys and key shares, since those are part of the algorithm structs. Also, NetworkInfo
is shared between the subalgorithms, and the default Deserialize
implementation would create a lot of duplicates.
We should probably encourage the user to store the NetworkInfo
— and in particular the secret keys — separately, so they can take measures to protect them.
So we could implement a serializable state representation for each algorithm, which wouldn't include the NetworkInfo
.
Alternatively maybe we could add a journal to Step
, containing individual state change logs. That would allow persisting the state update in every turn, and restoring the state from a journal even after a crash.
As discussed privately the "journal" function would be best.
An essential feature for "production-ready" status.
Conceptually, the event handler methods (handle_input
, handle_message
) take:
NetworkInfo
(&mut self
),&mut R
where R: Rng
),and they output:
&mut self
),Step
).The journal would be added to Step
as a sequence of entries that fully define the difference between the old state and the new state.
The naive way of implementing that is not an option: If we try to go through the code and create a journal entry in every one of the hundreds of places where we modify the mutable algorithm state, we'll never get it right, and the code wouldn't be maintainable; every change where we forget to add or update the corresponding journal entry creation would break it. I see three possible ways forward:
Instead, just make sure that the event handlers are all fully deterministic. In fact, this should already be the case!
So the user can already emulate journaling functionality by:
Then after a crash or restart, the algorithm's state can be restored by replaying all inputs and using the same seeds. The output, outgoing messages and fault logs can all be thrown away, since they have already been handled.
To avoid having to store all of this forever, it would suffice to make the algorithm state serializable. Then the log can periodically be deleted and replaced with the current snapshot.
I suspect that this option would actually not be very wasteful: hbbft's message types don't contain much redundant information, so in normal operation the log might not even require more space than journal entries would. (To avoid storing messages that didn't change the state, we could add some flag to Step
.)
If that's true, I don't think there is a good justification for the other options:
All handle_event
and handle_message
implementations could clone the old state, and, before returning, produce log entries based on the difference between the old and new states. That way, at least we wouldn't have to touch the internal logic too much.
However, it can be wasteful to clone the full state every time. In some places, this could be optimized a bit, e.g. we'd certainly put the HoneyBadger
contributions, which can be large, into Rc
s instead of cloning them. But there would still be a lot of cloning going on…
This would be the most invasive change, but it would avoid the cloning:
Separate all the algorithms' states, and make their fields private. They can only be modified by calling state.apply(&journal_entry)
. That way we are forced to always produce the correct journal entries. We can make them #[must_use]
, too: They must always be added into the Step
.
This would mean almost a complete rewrite, though.
Interesting approach using deterministic replay of hbbft input to get back to the state before shutdown.
In Parity we already will have to persist outgoing, but unsent, messages - persisting incoming messages as long as a hbbft epoch is active should be similar in implementation.
I don't think it makes sense to implement snapshot functionality for all the one-shot subalgorithms. And for HoneyBadger
it suffices to make a "snapshot" on each epoch change, which essentially consists of only the epoch number.
So I'm actually not sure we need to add any functionality to hbbft at all. Here's what I think should already work:
HoneyBadger
before sending out the messages from the returned Step
. For each input, use a new PRNG initialized with a random seed, and store the seed.n
completes (block #n
is imported), all messages msg
with msg.epoch() <= n
can be deleted, as well as all inputs for epochs <= n
. The current epoch number n + 1
needs to be stored on disk.HoneyBadgerBuilder::new(network_info).epoch(n + 1).build()
(and any additional parameters that were set), and make it handle all saved inputs and messages with epoch >= n
again. For the inputs, use PRNGs initialized with the stored seeds.Please reopen if there's anything we need to add to hbbft to support restarts and crash recovery.
For Parity (https://github.com/poanetwork/parity-ethereum/issues/181) and other applications, we need to make it possible to persist the state of consensus to disk. That allows a node to e.g. restart without becoming faulty.
We will add a journaling feature to hbbft: Each
Step
returned by the algorithms will contain journal log entries that represent the internal state change. These can be persisted, and even after a crash, the full state can be recovered by replaying the journal. That way, a node can continue to operate without being marked as faulty or skipping an epoch.To keep the journal from growing indefinitely, the entries' epoch number should be user-visible: The upper layer can always delete all entries from previous epochs.
So let's make all algorithm structs implement the serde traits (Serialize
andDeserialize
).