Open cody-littley opened 1 month ago
What if, instead of adding these new wires between components that do not currently communicate, we added the required pre-consensus events to the round produced by consensus?
@poulok
What if, instead of adding these new wires between components that do not currently communicate, we added the required pre-consensus events to the round produced by consensus?
This would work I think, as long as we design the data structure that buffers events in a way that makes this snapshotting computationally cheap.
@Cody, we decided today that the short term fix is to use an offline tool to trim the PCES file content to judges and non-ancient ancestors. Mid and long term, we will re-genesis the platform state offline in order to support offline migration testing, at which point we can remove the PCES copy into the snapshot directory. Can this ticket be closed?
Problem
For testing purposes, we current copy PCES files into the state snapshot directory. Unfortunately, this feature was rushed and the implementation is prone to fail (race conditions). When this effort fails it does not harm production environments (we ignore these files in production), but it does cause grief in our testing pipelines.
Algorithm
Create a new component called something like
PcesSnapshotBuffer
. We will stream all preconsensus events into this component. Internally, the buffer will keep the events in topological order, and will remove ancient events from the buffer. However, instead of advancing the ancient threshold when consensus advances, it will advance the ancient threshold every time the state file manager advances.When the state file manager receives a state, it needs to decide if that state should be written to disk or not. If not, it will tell the
PcesSnapshotBuffer
to advance its ancient threshold. If it decides to write that state to disk, it will request a list of all non-ancient events (with respect to that state) that are also ancestors of the judges of that state (a judge is considered to be an ancestor to itself). It will take those events and write them into a special PCES file in the state snapshot directory.Implementation