kroma-network / go-ethereum

go-ethereum for Kroma
GNU Lesser General Public License v3.0
48 stars 23 forks source link

Error while syncing: RPC method engine_newPayloadV2 crashed: pebble: batch too large: >= 4.0GB #111

Closed northwestnodes-eric closed 1 month ago

northwestnodes-eric commented 7 months ago

System information

Geth version: geth version CL client & version: lighthouse OS & Version: Linux Ubuntu 22.04 Commit hash : (if develop)

Expected behaviour

Continuation of synchronization

Actual behaviour

Crash and burn. Even after disabling the engine_ API namespace, synchronization crashes at this particular block.

Steps to reproduce the behaviour

Sync a Kroma node from scratch, using the PebbleDB backend.

Backtrace

kroma-geth  | {"t":"2024-04-25T21:38:46.669243547Z","lvl":"info","msg":"Chain head was updated","number":"1230340","hash":"0x29e6a2964becbebadbe5c933e21823eee07d56db2e25910377a03dedb0b03f0b","root":"0x0f142ad0a3531fed0cfbd540dca9a0c67ba4ee1c4076aa6af8f1493e11db2d51","elapsed":"5.751945ms","age":"6mo3w4d"}

kroma-geth  | {"t":"2024-04-25T21:38:46.690362168Z","lvl":"info","msg":"Starting work on payload","id":"0x31b317f01c25e5ae"}

kroma-geth  | {"t":"2024-04-25T21:39:20.278492638Z","lvl":"warn","msg":"Ignoring already known beacon payload","number":1230341,"hash":"0xd0133fbfef45fa5653e498f49cf7ae0ab3b66b1f3d72c8ee5600274eb4cc1bf3","age":"6mo3w4d"}

kroma-geth  | {"t":"2024-04-25T21:39:20.278357009Z","lvl":"eror","msg":"RPC method engine_newPayloadV2 crashed: pebble: batch too large: >= 4.0GB\ngoroutine 17572837 [running]:\ngithub.com/ethereum/go-ethereum/rpc.(*callback).call.func1()\n\tgithub.com/ethereum/go-ethereum/rpc/service.go:199 +0x85\npanic({0x1872140?, 0xc000012dc8?})\n\truntime/panic.go:914 +0x21f\ngithub.com/cockroachdb/pebble.(*Batch).grow(0xc000067400?, 0xc515c25ec0?)\n\tgithub.com/cockroachdb/pebble@v0.0.0-20230928194634-aa077af62593/batch.go:1385 +0x12d\ngithub.com/cockroachdb/pebble.(*Batch).prepareDeferredKeyValueRecord(0xc513b5cd80, 0x20, 0x41, 0x1)\n\tgithub.com/cockroachdb/pebble@v0.0.0-20230928194634-aa077af62593/batch.go:565 +0x86\ngithub.com/cockroachdb/pebble.(*Batch).SetDeferred(...)\n\tgithub.com/cockroachdb/pebble@v0.0.0-20230928194634-aa077af62593/batch.go:696\ngithub.com/cockroachdb/pebble.(*Batch).Set(0xc513b5cd80, {0xc2c256e420, 0x20, 0x1f35dcd93b35e19f?}, {0xc285eae720, 0x41, 0xc4709969d8?}, 0xc000914d80?)\n\tgithub.com/cockroachdb/pebble@v0.0.0-20230928194634-aa077af62593/batch.go:678 +0x3b\ngithub.com/ethereum/go-ethereum/ethdb/pebble.(*batch).Put(0xc515c78a80, {0xc2c256e420?, 0x20?, 0x20?}, {0xc285eae720?, 0x41?, 0x60?})\n\tgithub.com/ethereum/go-ethereum/ethdb/pebble/pebble.go:555 +0x34\ngithub.com/ethereum/go-ethereum/trie/triedb/hashdb.(*ZktrieDatabase).commitAllDirties(0xc0008a9200)\n\tgithub.com/ethereum/go-ethereum/trie/triedb/hashdb/zktrie_database.go:101 +0xf3\ngithub.com/ethereum/go-ethereum/trie/triedb/hashdb.(*ZktrieDatabase).Commit(0xc0008a9200, {0x10, 0xc, 0x7f, 0xfb, 0x14, 0x73, 0xf0, 0x3d, 0x57, ...}, ...)\n\tgithub.com/ethereum/go-ethereum/trie/triedb/hashdb/zktrie_database.go:74 +0x65\ngithub.com/ethereum/go-ethereum/trie.(*Database).Commit(0x0?, {0x10, 0xc, 0x7f, 0xfb, 0x14, 0x73, 0xf0, 0x3d, 0x57, ...}, ...)\n\tgithub.com/ethereum/go-ethereum/trie/database.go:181 +0x58\ngithub.com/ethereum/go-ethereum/core.(*BlockChain).writeBlockWithState(0xc000581c00, 0xc512819ae0, {0xc5155f8d40, 0x2, 0x2}, 0xc5155ca780)\n\tgithub.com/ethereum/go-ethereum/core/blockchain.go:1467 +0xbbc\ngithub.com/ethereum/go-ethereum/core.(*BlockChain).insertChain(0xc000581c00, {0xc51544c608?, 0x1, 0x1}, 0x0)\n\tgithub.com/ethereum/go-ethereum/core/blockchain.go:1863 +0x2137\ngithub.com/ethereum/go-ethereum/core.(*BlockChain).InsertBlockWithoutSetHead(0xc000581c00, 0xc512819ae0)\n\tgithub.com/ethereum/go-ethereum/core/blockchain.go:2334 +0xc7\ngithub.com/ethereum/go-ethereum/eth/catalyst.(*ConsensusAPI).newPayload(_, {{0x29, 0xe6, 0xa2, 0x96, 0x4b, 0xec, 0xbe, 0xba, 0xdb, ...}, ...}, ...)\n\tgithub.com/ethereum/go-ethereum/eth/catalyst/api.go:590 +0xe28\ngithub.com/ethereum/go-ethereum/eth/catalyst.(*ConsensusAPI).NewPayloadV2(_, {{0x29, 0xe6, 0xa2, 0x96, 0x4b, 0xec, 0xbe, 0xba, 0xdb, ...}, ...})\n\tgithub.com/ethereum/go-ethereum/eth/catalyst/api.go:484 +0x331\nreflect.Value.call({0xc00085eb40?, 0xc000926190?, 0x7eff9df4b4f0?}, {0x199a6ad, 0x4}, {0xc5159f22d0, 0x2, 0x418292?})\n\treflect/value.go:596 +0xce7\nreflect.Value.Call({0xc00085eb40?, 0xc000926190?, 0x564265?}, {0xc5159f22d0?, 0x1?, 0x16?})\n\treflect/value.go:380 +0xb9\ngithub.com/ethereum/go-ethereum/rpc.(*callback).call(0xc000930000, {0x27b1760?, 0xc5159f2230}, {0xc5159aa1c8, 0x13}, {0xc5159ace28, 0x1, 0x4d28ef?})\n\tgithub.com/ethereum/go-ethereum/rpc/service.go:205 +0x379\ngithub.com/ethereum/go-ethereum/rpc.(*handler).runMethod(0xc51395db00?, {0x27b1760?, 0xc5159f2230?}, 0xc5155a6770, 0x1?, {0xc5159ace28?, 0x419968?, 0xa40000c000580000?})\n\tgithub.com/ethereum/go-ethereum/rpc/handler.go:565 +0x3c\ngithub.com/ethereum/go-ethereum/rpc.(*handler).handleCall(0xc5128199a0, 0xc5159e88a0, 0xc5155a6770)\n\tgithub.com/ethereum/go-ethereum/rpc/handler.go:512 +0x22f\ngithub.com/ethereum/go-ethereum/rpc.(*handler).handleCallMsg(0xc5128199a0, 0xc5159e8900?, 0xc5155a6770)\n\tgithub.com/ethereum/go-ethereum/rpc/handler.go:470 +0x22d\ngithub.com/ethereum/go-ethereum/rpc.(*handler).handleNonBatchCall(0xc5128199a0, 0xc5159e88a0, 0xc5155a6770)\n\tgithub.com/ethereum/go-ethereum/rpc/handler.go:296 +0x187\ngithub.com/ethereum/go-ethereum/rpc.(*handler).handleMsg.func1.1(0x27b1760?)\n\tgithub.com/ethereum/go-ethereum/rpc/handler.go:269 +0x25\ngithub.com/ethereum/go-ethereum/rpc.(*handler).startCallProc.func1()\n\tgithub.com/ethereum/go-ethereum/rpc/handler.go:387 +0xbe\ncreated by github.com/ethereum/go-ethereum/rpc.(*handler).startCallProc in goroutine 16644624\n\tgithub.com/ethereum/go-ethereum/rpc/handler.go:383 +0x79\n"}

Is there any way to increase the allowed size? I do feel a batch size of >= 4.0GB is a bit hefty?

0xHansLee commented 4 months ago

Sorry for late response. You can increate the limit of batch and the size with rpc.batch-request-limit and rpc.batch-response-max-size, respectively. BTW, did you try to sync Kroma mainnet? If you have issues not resolved, please leave any comments. I will respond to it. Thanks.

northwestnodes-eric commented 4 months ago

Hi @0xHansLee - we did indeed try to sync Kroma mainnet from scratch using the PebbleDB implementation. Trying again right now (currently at ~8mo) with the aforementioned flags set to 0 (indicating unlimited, AFAIK). I'll keep you posted.

northwestnodes-eric commented 4 months ago

Hi @0xHansLee the sync crashed with a different error this time. kroma-geth complained about the following:

kroma-geth  | {"t":"2024-07-15T06:04:13.052137214Z","lvl":"eror","msg":"fail to resolve hash node","hash":"16028726...","err":"pebble: not found"}
kroma-geth  | {"t":"2024-07-15T06:04:13.052339318Z","lvl":"eror","msg":"fail to resolve hash node","hash":"16028726...","err":"pebble: not found"}
kroma-geth  | {"t":"2024-07-15T06:04:13.637616567Z","lvl":"eror","msg":"fail to resolve hash node","hash":"62538809...","err":"pebble: not found"}
kroma-geth  | {"t":"2024-07-15T06:04:13.637850094Z","lvl":"eror","msg":"fail to resolve hash node","hash":"62538809...","err":"pebble: not found"}
kroma-geth  | {"t":"2024-07-15T06:04:13.643579501Z","lvl":"eror","msg":"fail to resolve hash node","hash":"62538809...","err":"pebble: not found"}
kroma-geth  | {"t":"2024-07-15T06:04:13.643801766Z","lvl":"eror","msg":"fail to resolve hash node","hash":"62538809...","err":"pebble: not found"}
kroma-geth  | {"t":"2024-07-15T06:04:13.651338141Z","lvl":"eror","msg":"fail to resolve hash node","hash":"62538809...","err":"pebble: not found"}
kroma-geth  | {"t":"2024-07-15T06:04:13.651395804Z","lvl":"eror","msg":"fail to resolve hash node","hash":"62538809...","err":"pebble: not found"}
kroma-geth  | {"t":"2024-07-15T06:04:13.68728785Z","lvl":"eror","msg":"fail to resolve hash node","hash":"62538809...","err":"pebble: not found"}
kroma-geth  | {"t":"2024-07-15T06:04:13.687504086Z","lvl":"eror","msg":"fail to resolve hash node","hash":"62538809...","err":"pebble: not found"}
kroma-geth  | {"t":"2024-07-15T06:04:14.37603311Z","lvl":"eror","msg":"fail to resolve hash node","hash":"62538809...","err":"pebble: not found"}
kroma-geth  | {"t":"2024-07-15T06:04:14.376260287Z","lvl":"eror","msg":"fail to resolve hash node","hash":"62538809...","err":"pebble: not found"}
kroma-geth  | {"t":"2024-07-15T06:04:15.657100043Z","lvl":"eror","msg":"fail to resolve hash node","hash":"65514486...","err":"pebble: not found"}
kroma-geth  | {"t":"2024-07-15T06:04:15.657361031Z","lvl":"eror","msg":"fail to resolve hash node","hash":"65514486...","err":"pebble: not found"}
kroma-geth  | {"t":"2024-07-15T06:04:15.937945201Z","lvl":"eror","msg":"fail to resolve hash node","hash":"43145104...","err":"pebble: not found"}
kroma-geth  | {"t":"2024-07-15T06:04:15.938187647Z","lvl":"eror","msg":"fail to resolve hash node","hash":"43145104...","err":"pebble: not found"}
kroma-geth  | {"t":"2024-07-15T06:04:15.970086977Z","lvl":"eror","msg":"fail to resolve hash node","hash":"43145104...","err":"pebble: not found"}

Trying to sync again.

0xHansLee commented 3 months ago

Hey @northwestnodes-eric, sorry again for late response. Could you try using our snapshot data? The url is https://snapshot.kroma.network/latest/snapshot.tar.gz.

xinzhongyoumeng commented 3 months ago

@northwestnodes-eric if you are a full node? I also had the same problem,How did you solve it?

northwestnodes-eric commented 2 months ago

Apologies for the late response.

We solved the syncing issue with the increased batch settings.

What happened though, is that kroma-geth never caught up with the head of the chain. It got stuck at ~12 hours behind head. We let the node run for about a week like this.

When we switched back to LevelDB/Hash all was well in the world.

I hope this issue can be fixed soon, as Pebble/PBSS helps the node operators out tremendously.

seolaoh commented 1 month ago

kroma-geth's database backend is currently zktrie, not MPT. So kroma-geth doesn't support PBSS unfortunately.

According to #117 , it seems you guys solved the problem, so I'll close this issue. If there occurs any other issues, please open a new issue.