hashgraph / hedera-services

Crypto, token, consensus, file, and smart contract services for the Hedera public ledger
Apache License 2.0
283 stars 124 forks source link

Propose design for platform status management #6296

Closed poulok closed 1 year ago

poulok commented 1 year ago

Proposal

Proposed status state machine:

PlatformStatus_rework drawio

Implementation proposal:

Usage of Wall Clock Time

The following state transitions depend on an amount of wall clock time passing

  1. OBSERVING -> CHECKING. Wall clock time is relevant for this transition so that the following edge case is covered:
    • node X creates some events and gossips them out
    • before these events are written to the preconsensus event stream, node X crashes
    • when node X comes back online, it should wait for an amount of wall clock time to elapse before creating new events, so that it can receive from its neighbors those self events which didn't get written to the PCES before the crash
    • if node X were to immediately begin creating events after booting up, it could potentially create a new event with the same parent as one of the pre-crash events, causing a branch. This should be avoided
  2. OBSERVING -> FREEZING. Wall clock time is relevant in this case for the same reasons as detailed above
  3. ACTIVE -> CHECKING. If a node observes a set amount of wall clock time elapsing without any own events reaching consensus, it should stop accepting app transactions until it sees own events reaching consensus again
    • this must use wall clock time as opposed to consensus time, since the node ought to transition out of ACTIVE even if consensus time isn't advancing

Future Work

Approvers

Required

@cody-littley @lpetrovic05

Optional

@edward-swirldslabs @poulok

poulok commented 1 year ago

This proposal is adopted gavel