Sometimes, firehose sends block *below* the actual cursor block

From this graph-node log:

2023-04-07 17:54:49.380 DEBG 0 candidate triggers in this block, block_hash: 0xbdf42b9d117c0c0ac8d0882a957d703957f08fda98977ef2d902f9f3c5219bdc, block_number: 8791730, sgd: 536172, subgraph_id: QmS2GCuAkzH2kNDYe2pA9HkRTPLpC5DpbXRqhQW93exZEM, component: SubgraphInstanceManager
2023-04-07 18:35:13.918 ERRO An error occurred while streaming blocks: status: Internal, message: "unexpected stream termination", details: [], metadata: MetadataMap { headers: {} }, provider: goerli-firehose-pinax, deployment: QmS2GCuAkzH2kNDYe2pA9HkRTPLpC5DpbXRqhQW93exZEM, sgd: 536172, subgraph_id: QmS2GCuAkzH2kNDYe2pA9HkRTPLpC5DpbXRqhQW93exZEM, component: FirehoseBlockStream
2023-04-07 18:35:14.418 INFO Blockstream disconnected, connecting, provider_err_count: 0, cursor: iuBX60B1d2impcutsaAcFKWwLpcyDl1mVAzmKhsT0d3y8iDMiZynBzJ1Ox2Bw6Gk3R3jSQul29iZE354-8VX6tHilew15CkxQXx5xYu68rTtLvqkOlkec7prDb7daNDcUj3RZQ3xfrBT4tXgMvHYZkAwMMV1KjK2jWsCpNBcIvUX7CY1w2n_esqH0vjFpIBJ-bcjEOPylCKiVj0pJRwPPMWDbvXNvA==, subgraph: QmS2GCuAkzH2kNDYe2pA9HkRTPLpC5DpbXRqhQW93exZEM, start_block: 8784178, endpoint_uri: goerli-firehose-sf, provider: goerli-firehose-sf, deployment: QmS2GCuAkzH2kNDYe2pA9HkRTPLpC5DpbXRqhQW93exZEM, sgd: 536172, subgraph_id: QmS2GCuAkzH2kNDYe2pA9HkRTPLpC5DpbXRqhQW93exZEM, component: FirehoseBlockStream
2023-04-07 18:35:27.453 INFO Blockstream connected, provider: goerli-firehose-sf
2023-04-07 18:35:27.453 DEBG 0 candidate triggers in this block, block_hash: 0x787ac44d2913367a0c6286411f204be77175789888b430fc18757bfbf5520d5d, block_number: 8791700
2023-04-07 18:35:27.453 ERRO Subgraph writer failed, error: subgraph `QmS2GCuAkzH2kNDYe2pA9HkRTPLpC5DpbXRqhQW93exZEM` has already processed block `8791700`

What happened frmo the graph-node perspective:

At 17:54, it received block 8791730 (it was connected to Pinax)
Then no more logs until 18:35, so it seems to have been stuck.
At 18:35, it got an unexpected termination (maybe caused by timeout or a service restart from pinax, hence the 'getting stuck')
it reconnected with: start_block=8784178 and a cursor that points to 8791730 with the correct canonical hash (I decoded it here with opaque tool)
It seems to have then received blocks from streamingfast firehose starting from 8791700. (I also saw block 8791701 come in in truncated log before the error was triggered)

I tried reproducing the behavior here without success.

The INFO level of logs does not allow me to confirm if the block 8791700 was actually sent from firehose, because it only prints at every "modulo 200".

It happened to at least 2 different subgraphs connecting to firehose, at exactly the same block.

Hypothesis:

some kind of race condition that depends on where the block is (in reversible segment, written to file, maybe in the buffer in between). It is possible that the first (pinax) firehose endpoint got stuck for long enough for the head block to get written to disk, and both subgraphs connected at exactly the same moment where that file got written, so it was still in the buffer...

streamingfast / firehose-ethereum

Sometimes, firehose sends block below the actual cursor block #59

streamingfast / firehose-ethereum

Sometimes, firehose sends block *below* the actual cursor block #59

Sometimes, firehose sends block below the actual cursor block #59