hermeznetwork / hermez-node

Hermez node Go implementation
GNU Affero General Public License v3.0
60 stars 33 forks source link

Sync intermediate state reset is causing the node to get stuck #1142

Open tclemos opened 2 years ago

tclemos commented 2 years ago

Summary of Bug

When an error happens during the synchronization process, the intermediate state reset makes the hermez node to get stuck in an infinite error loop.

Expected Behavior

Reset both databases to the same batch/block and continue to sync.

Steps to Reproduce

Configure your node to use this roll up SC: https://goerli.etherscan.io/address/0xf08a226B67a8A9f99cCfCF51c50867bc18a54F53 The batch 32717 has an error and will cause the sync process to fail, this will trigger the intermediate state reset. After the reset is executed the node will be stuck with an error similar to this:

2021-09-16T15:11:43Z    DEBUG   statedb/statedb.go:225  Making StateDB checkpoint       {"batch": 32717, "type": "synchronizer"}
2021-09-16T15:11:43Z    DEBUG   statedb/statedb.go:266  Making StateDB Reset    {"batch": 32716, "type": "synchronizer"}
2021-09-16T15:11:44Z    ERROR   node/node.go:806        Synchronizer.Sync: stateDB.BatchNum (32717) != evtForgeBatch.BatchNum = (32720)
/home/ubuntu/github.com/hermeznetwork/hermez-node/synchronizer/synchronizer.go:1039 github.com/hermeznetwork/hermez-node/synchronizer.(*Synchronizer).rollupSync()
/home/ubuntu/github.com/hermeznetwork/hermez-node/synchronizer/synchronizer.go:627 github.com/hermeznetwork/hermez-node/synchronizer.(*Synchronizer).Sync()
/home/ubuntu/github.com/hermeznetwork/hermez-node/node/node.go:741 github.com/hermeznetwork/hermez-node/node.(*Node).syncLoopFn()
/home/ubuntu/github.com/hermeznetwork/hermez-node/node/node.go:797 github.com/hermeznetwork/hermez-node/node.(*Node).StartSynchronizer.func1()

        {"err": "stateDB.BatchNum (32717) != evtForgeBatch.BatchNum = (32720)"}

System information

Additional Information:

We suspect the intermediate state reset is not resetting the state properly in both databases, looks like the StateDB is getting behind to the HistoryDB, and this is causing the DBs to be out of sync.