Closed sduchesneau closed 1 year ago
The committing new state' message comes from
func (*GenesisContractsClient) CommitState`
I believe the block (40335152) was produced into firehose, then this happened:
https://github.com/streamingfast/go-ethereum/blob/release/polygon-0.x-fh2/core/blockchain.go#L1834 -> fmt.Errorf("invalid merkle root (remote: %x local: %x)"...)
if the block was already "closed" at that moment, then we won't create the same block again in firehose.. right? investigating...
system transactions come from:
(core/state_processor) Process() (between finalizeblock and endblock) ->
(bor) Finalize() ->
(bor) CommitStates() ->
(bor/GenesisContractsClient) CommitState() ->
(bor/statefull) ApplyMessage() ->
(firehose/context) StartTransactionRaw()
however, the validation of the block is done after that! so the block would be ended completely, THEN it would be rejected. THEN it would be "produced again" ... but since it has the same hash, the incomplete version of the block is kept.
After catching another similar event with the full FIRE logs activated, I see that the block 40343056 is emitted mulitple times WITHOUT the system transactions while the heimdall is not up-to-date, failing "after" the END_BLOCK statement.
When the heimdall is back in-sync, the block is emitted again, this time with the system transactions (between "FIRE FINALIZE_BLOCK 40343056" and "FIRE END_BLOCK 40343056"). the console reader, will only take the first block into account (mostly because it does not ovewritee the one-block-files on disk), so we end up with the invalid block.
Solution will be to call END_BLOCK only after the block is validated, and trigger a "failure" if it is invalid.
Fixed
When Heimdall is unreachable for a while, the "SprintEnd" blocks that should contain system transactions are produced, with the correct block hash, but they do not contain the StateSync transactions.
Example log sequence for block 40335152 that should contain system transactions:
^ these messages get repeated during the period where heimdall is unavailable.
The next "BAD BLOCK" message is preceded by
INFO [03-14|12:14:34.991] Skip duplicated bad block number=40,335,152 hash=f13b36..06a378
Then eventually we get: