hashgraph / hedera-services

Crypto, token, consensus, file, and smart contract services for the Hedera public ledger
Apache License 2.0
266 stars 119 forks source link

Race condition free PCES files in state #13307

Open cody-littley opened 1 month ago

cody-littley commented 1 month ago

Problem

For testing purposes, we current copy PCES files into the state snapshot directory. Unfortunately, this feature was rushed and the implementation is prone to fail (race conditions). When this effort fails it does not harm production environments (we ignore these files in production), but it does cause grief in our testing pipelines.

Algorithm

Create a new component called something like PcesSnapshotBuffer. We will stream all preconsensus events into this component. Internally, the buffer will keep the events in topological order, and will remove ancient events from the buffer. However, instead of advancing the ancient threshold when consensus advances, it will advance the ancient threshold every time the state file manager advances.

When the state file manager receives a state, it needs to decide if that state should be written to disk or not. If not, it will tell the PcesSnapshotBuffer to advance its ancient threshold. If it decides to write that state to disk, it will request a list of all non-ancient events (with respect to that state) that are also ancestors of the judges of that state (a judge is considered to be an ancestor to itself). It will take those events and write them into a special PCES file in the state snapshot directory.

Implementation

pcesSnapshotBuffer

poulok commented 1 month ago

What if, instead of adding these new wires between components that do not currently communicate, we added the required pre-consensus events to the round produced by consensus?

cody-littley commented 1 month ago

@poulok

What if, instead of adding these new wires between components that do not currently communicate, we added the required pre-consensus events to the round produced by consensus?

This would work I think, as long as we design the data structure that buffers events in a way that makes this snapshotting computationally cheap.

poulok commented 1 month ago

@Cody, we decided today that the short term fix is to use an offline tool to trim the PCES file content to judges and non-ancient ancestors. Mid and long term, we will re-genesis the platform state offline in order to support offline migration testing, at which point we can remove the PCES copy into the snapshot directory. Can this ticket be closed?