stacks-network / stacks-core

The Stacks blockchain implementation
https://docs.stacks.co
GNU General Public License v3.0
3.01k stars 667 forks source link

Event observer recoverability in event of unclean stacks-node shutdown #5281

Closed obycode closed 6 days ago

obycode commented 2 weeks ago

Problem

During the restart to upgrade naka-4 to rc3, we witnessed this situation:

  1. stacks-node processes a block
  2. stacks-node is shutdown before successfully sending the new block event to event observers (API in this case)
  3. stacks-node is restarted
  4. Because the last block was successfully processed, the node does not know that it never successfully sent the block to the event observers, so it proceeds with the next block
  5. API observer errors when it receives the next block, since it never received its parent block
  6. stacks-node is unable to proceed since it does not receive a successful response for the new block event

Proposed solution

obycode commented 2 weeks ago

The most obvious place to implement this change is directly in EventObserver::send_payload. This would result in duplicated information in the database if a node has multiple observers, but it would reduce the amount of refactoring required and also give us finer grain info about which observers need events rebroadcasted (only rebroadcast to observers that did not confirm the event last time, instead of always rebroadcasting the event to all observers). In the majority of cases, a node probably has 0 or 1 observers, so there is likely no real difference in practice.

obycode commented 2 weeks ago

This is addressed in #5289.

obycode commented 6 days ago

Merged.