paritytech / json-rpc-interface-spec

30 stars 3 forks source link

Request on feedback for the polkadot-api "recovery from `stop` events" strategy #145

Closed josepot closed 8 months ago

josepot commented 8 months ago

We've been working on improving our logic for recovering from "stop" events and encountered some complexities along the way. However, we've identified a potential optimization in our approach, thanks to a detail in the spec. Before proceeding, I wanted to share my strategy and gather any feedback or insights that I might have overlooked.

Current Approach: Our system maintains a data structure that tracks all currently pinned blocks, plus some extra info like ref-counts, pointers to parent and children, etc. When a "stop" event occurs, our current logic terminates all operations on these blocks and resets the state, which is less than ideal.

Proposed Strategy: Instead of terminating operations upon receiving a "stop" event, the new approach involves temporarily pausing these operations, and we will either resume or error that operation depending on the new set of blocks.

Here's how it would work:

This approach hinges on the spec's provision that the first "best-block" event effectively communicates the new set of relevant blocks, allowing us to make informed decisions on the fly.

Our actual logic is a tinny bit more complex, because we also deal with the fact that a "stop" event could come before having received the "best-block" event of the new subscription. However, leaving that edge-case aside we would like to ask whether this approach makes sense, and/or whether we should re-think it.

Thanks!

tomaka commented 8 months ago

I would personally keep the current approach. The stop event is never supposed to happen under normal operations.

josepot commented 8 months ago

I would personally keep the current approach. The stop event is never supposed to happen under normal operations.

Then I guess that we will be opening issues on smoldot, because it currently happens under "normal" operations.

Nevertheless, I can't help to wonder: then, what was the point of adding a list of finalizedBlocks into the initialized event? Wasn't the point of that to have the ability to recover from the stop event?

tomaka commented 8 months ago

because it currently happens under "normal" operations.

Smoldot will generate a stop event under normal circumstances when it syncs more than 32 (IIRC) blocks at once. In that case, you wouldn't be able to recover from that anyway.

then, what was the point of adding a list of finalizedBlocks into the initialized event? Wasn't the point of that to have the ability to recover from the stop event?

You can recover from stop event if you really want to, but to me it's not worth doing that, especially when it comes to non-finalized blocks.