Closed andreclaro closed 5 months ago
Update: We are seeing these issues in nodes syncing from genesis as well.
We are facing a similar issue
op-node
- command:
- /app/op-node
- --network=base-mainnet
- --l1=http://
- --l1.beacon=http://
- --l2=ws://localhost:8551
- --l2.jwt-secret=/home/node/jwt-secret.txt
- --rpc.addr=0.0.0.0
- --rpc.port=9545
- --p2p.priv.path=/home/node/p2p-private-key.txt
- --p2p.peerstore.path=memory
- --p2p.discovery.path=memory
- --metrics.enabled
- --l1.trustrpc=true
op-erigon
- command:
- erigon
- --datadir=/home/node/data
- --db.pagesize=4096
- --ethash.dagdir=/home/node/data/dag
- --db.size.limit=8TB
- --chain=base-mainnet
- --networkid=8453
- --authrpc.jwtsecret=/home/node/jwt-secret.txt
- --authrpc.vhosts=*
- --http
- --http.addr=0.0.0.0
- --http.corsdomain=null
- --http.vhosts=*
- --http.api=eth,erigon,net,web3,debug,trace
- --ws
- --rpc.batch.limit=1000
- --rpc.returndata.limit=150000
- --rollup.sequencerhttp=https://mainnet-sequencer.base.org
- --rollup.historicalrpc=
- --rollup.historicalrpctimeout=5s
- --rpc.txfeecap=100
- --maxpeers=100
- --nodiscover
- --identity=
- --rollup.disabletxpoolgossip=true
- --metrics
- --metrics.addr=0.0.0.0
- --healthcheck
- --private.api.addr=0.0.0.0:9090
- --private.api.ratelimit=63744
- --db.size.limit=8TB
- --rpc.batch.concurrency=4
- --db.read.concurrency=4096
- --rpc.batch.limit=200
- --rpc.returndata.limit=200000
Erigon Logs
[INFO] [06-03|17:19:45.686] [updateForkchoice] Fork choice update: flushing in-memory state (built by previous newPayload)
[WARN] [06-03|17:19:46.686] [rpc] served method=engine_forkchoiceUpdatedV3 reqid=2901 t=1.000723139s err="forkChoiceUpdated timeout"
[WARN] [06-03|17:19:48.792] [rpc] served method=engine_forkchoiceUpdatedV3 reqid=2902 t=1.623159ms err="cannot update forkchoice. execution service is busy"
[INFO] [06-03|17:19:49.956] RPC Daemon notified of new headers from=15182419 to=15182420 amount=1 hash=0xf8b051f57016879065ca3d445a71a148aba80881543e30b1e107f06e4ecb68e0 header sending=57.651µs log sending=460ns
[INFO] [06-03|17:19:49.956] head updated hash=0xf8b051f57016879065ca3d445a71a148aba80881543e30b1e107f06e4ecb68e0 number=15182420
[INFO] [06-03|17:19:52.970] Optimism ForkChoice is choosing to unwind to a previously canonical block blockHash=0x31e63522b043c6b753920ff39fc9e83db3cf568c18f81f5083c23b54226c38d2 blockNumber=15182419 headHash=0xf8b051f57016879065ca3d445a71a148aba80881543e30b1e107f06e4ecb68e0
[WARN] [06-03|17:19:53.970] [rpc] served method=engine_forkchoiceUpdatedV3 reqid=2903 t=1.001794509s err="forkChoiceUpdated timeout"
[INFO] [06-03|17:19:54.413] [5/12 HashState] Unwinding started from=15182420 to=15182419 storage=false codes=true
[INFO] [06-03|17:19:54.415] [5/12 HashState] Unwinding started from=15182420 to=15182419 storage=false codes=false
[INFO] [06-03|17:19:54.417] [5/12 HashState] Unwinding started from=15182420 to=15182419 storage=true codes=false
[INFO] [06-03|17:19:54.429] [6/12 IntermediateHashes] Unwinding from=15182420 to=15182419 csbucket=AccountChangeSet
[INFO] [06-03|17:19:54.430] [6/12 IntermediateHashes] Unwinding from=15182420 to=15182419 csbucket=StorageChangeSet
[INFO] [06-03|17:19:54.473] [6/12 IntermediateHashes] Trie root hash=0xee86369693adf23ac3a5c0735ed913f4b6e40ae8c213c06545bb47b18597a480
[INFO] [06-03|17:19:54.492] [4/12 Execution] Unwind Execution from=15182420 to=15182419
[INFO] [06-03|17:19:54.574] RPC Daemon notified of new headers from=15181395 to=15182419 amount=1024 hash=0x31e63522b043c6b753920ff39fc9e83db3cf568c18f81f5083c23b54226c38d2 header sending=5.072838ms log sending=590ns
[INFO] [06-03|17:19:54.574] head updated hash=0x31e63522b043c6b753920ff39fc9e83db3cf568c18f81f5083c23b54226c38d2 number=15182419
[INFO] [06-03|17:20:02.200] [ForkChoiceUpdated] duplicate build request
[INFO] [06-03|17:20:02.201] Received GetPayloadV3 payloadId=1
[INFO] [06-03|17:20:02.218] [NewPayload] Handling new payload height=15182420 hash=0xf8b051f57016879065ca3d445a71a148aba80881543e30b1e107f06e4ecb68e0
[INFO] [06-03|17:20:02.421] [4/12 Execution] Completed on block=15182420
[INFO] [06-03|17:20:02.601] RPC Daemon notified of new headers from=15182419 to=15182420 amount=1 hash=0xf8b051f57016879065ca3d445a71a148aba80881543e30b1e107f06e4ecb68e0 header sending=50.891µs log sending=940ns
[INFO] [06-03|17:20:02.602] head updated hash=0xf8b051f57016879065ca3d445a71a148aba80881543e30b1e107f06e4ecb68e0 number=15182420
[WARN] [06-03|17:20:03.221] [rpc] served method=engine_forkchoiceUpdatedV3 reqid=2907 t=1.00134398s err="forkChoiceUpdated timeout"
op-node
Logs
t=2024-06-03T17:19:48+0000 lvl=warn msg="Derivation process temporary error" attempts=2 err="engine stage failed: temp: temporarily cannot insert new safe block: failed to create new block via forkchoice: unrecognized rpc error: cannot update forkchoice. execution service is busy"
t=2024-06-03T17:20:01+0000 lvl=info msg="Optimistically queueing unsafe L2 execution payload" id=0x355ea51a58846846a3a069569bbb3cb121e2b60f0a777d196a5199e4a08a0c6b:15322927
t=2024-06-03T17:20:02+0000 lvl=warn msg="did not finish previous block building, starting new building now" prev_onto=0x31e63522b043c6b753920ff39fc9e83db3cf568c18f81f5083c23b54226c38d2:15182419 prev_payload_id=0x0000000000000001 new_onto=0x31e63522b043c6b753920ff39fc9e83db3cf568c18f81f5083c23b54226c38d2:15182419
@andreclaro @harshsingh-cs Thanks for reporting this issue so thoroughly guys. I'm trying to reproduce this issue on my own node with the same configurations, but it's still being synced to the chain head.
I'll be able to debug more once I have nodes with this same issue, but for now, I'm looking over the codebase only. I did find some potential bug in the codebase which might be related to this, but I can't make any guarantees that this will fix this issue.
Do you guys have any spare nodes that you can play around with? Can you check out feature/mininny/synchronous-fcu
branch and build the op-erigon node and see if the issue gets fixed by any chance?
If you don't have any development nodes, please give me a few days to sync and reproduce this issue on my end 😄
@mininny, I can give it a try on Optimism mainnet but I have this other issue https://github.com/testinprod-io/op-erigon/issues/171
@mininny I tried the branch but it did not help
op-erigon
logs
97ms
[INFO] [06-07|10:39:48.094] [NewPayload] Handling new payload height=15182421 hash=0xe722b1b3e59325df589e75336b06f88e82626e97f5865d13e816b68f2c5ca73e
[INFO] [06-07|10:39:48.259] [updateForkchoice] Fork choice update: flushing in-memory state (built by previous newPayload)
[INFO] [06-07|10:39:48.463] RPC Daemon notified of new headers from=15182420 to=15182421 amount=1 hash=0xe722b1b3e59325df589e75336b06f88e82626e97f5865d13e816b68f2c5ca73e header sending=47.642µs log sending=730ns
[INFO] [06-07|10:39:48.463] head updated hash=0xe722b1b3e59325df589e75336b06f88e82626e97f5865d13e816b68f2c5ca73e number=15182421
[WARN] [06-07|10:39:49.258] [rpc] served method=engine_forkchoiceUpdatedV3 reqid=2903 t=1.000708594s err="forkChoiceUpdated timeout"
[INFO] [06-07|10:39:51.299] Optimism ForkChoice is choosing to unwind to a previously canonical block blockHash=0xf8b051f57016879065ca3d445a71a148aba80881543e30b1e107f06e4ecb68e0 blockNumber=15182420 headHash=0xe722b1b3e59325df589e75336b06f88e82626e97f5865d13e816b68f2c5ca73e
[INFO] [06-07|10:39:51.363] [5/12 HashState] Unwinding started from=15182421 to=15182420 storage=false codes=true
[INFO] [06-07|10:39:51.364] [5/12 HashState] Unwinding started from=15182421 to=15182420 storage=false codes=false
[INFO] [06-07|10:39:51.367] [5/12 HashState] Unwinding started from=15182421 to=15182420 storage=true codes=false
[INFO] [06-07|10:39:51.376] [6/12 IntermediateHashes] Unwinding from=15182421 to=15182420 csbucket=AccountChangeSet
[INFO] [06-07|10:39:51.377] [6/12 IntermediateHashes] Unwinding from=15182421 to=15182420 csbucket=StorageChangeSet
[INFO] [06-07|10:39:51.426] [6/12 IntermediateHashes] Trie root hash=0x4bac6f4a92a8b92902ccc6bc4136349789dcee071de7be00e6e0a675b02e10b0
[INFO] [06-07|10:39:51.445] [4/12 Execution] Unwind Execution from=15182421 to=15182420
[INFO] [06-07|10:39:51.579] RPC Daemon notified of new headers from=15181396 to=15182420 amount=1024 hash=0xf8b051f57016879065ca3d445a71a148aba80881543e30b1e107f06e4ecb68e0 header sending=2.127271ms log sending=560ns
[INFO] [06-07|10:39:51.579] head updated hash=0xf8b051f57016879065ca3d445a71a148aba80881543e30b1e107f06e4ecb68e0 number=15182420
[WARN] [06-07|10:39:52.299] [rpc] served method=engine_forkchoiceUpdatedV3 reqid=2904 t=1.001689254s err="forkChoiceUpdated timeout"
[INFO] [06-07|10:39:56.456] [ForkChoiceUpdated] duplicate build request
[INFO] [06-07|10:39:56.457] Received GetPayloadV3 payloadId=1
[INFO] [06-07|10:39:56.468] [NewPayload] Handling new payload height=15182421 hash=0xe722b1b3e59325df589e75336b06f88e82626e97f5865d13e816b68f2c5ca73e
[INFO] [06-07|10:39:56.554] [4/12 Execution] Completed on block=15182421
[INFO] [06-07|10:39:56.748] RPC Daemon notified of new headers from=15182420 to=15182421 amount=1 hash=0xe722b1b3e59325df589e75336b06f88e82626e97f5865d13e816b68f2c5ca73e header sending=44.282µs log sending=500ns
[INFO] [06-07|10:39:56.748] head updated hash=0xe722b1b3e59325df589e75336b06f88e82626e97f5865d13e816b68f2c5ca73e number=15182421
[WARN] [06-07|10:39:57.470] [rpc] served method=engine_forkchoiceUpdatedV3 reqid=2908 t=1.000588474s err="forkChoiceUpdated timeout"
[INFO] [06-07|10:40:05.653] Optimism ForkChoice is choosing to unwind to a previously canonical block blockHash=0xf8b051f57016879065ca3d445a71a148aba80881543e30b1e107f06e4ecb68e0 blockNumber=15182420 headHash=0xe722b1b3e59325df589e75336b06f88e82626e97f5865d13e816b68f2c5ca73e
[INFO] [06-07|10:40:05.728] [5/12 HashState] Unwinding started from=15182421 to=15182420 storage=false codes=true
[INFO] [06-07|10:40:05.729] [5/12 HashState] Unwinding started from=15182421 to=15182420 storage=false codes=false
[INFO] [06-07|10:40:05.731] [5/12 HashState] Unwinding started from=15182421 to=15182420 storage=true codes=false
[INFO] [06-07|10:40:05.740] [6/12 IntermediateHashes] Unwinding from=15182421 to=15182420 csbucket=AccountChangeSet
[INFO] [06-07|10:40:05.741] [6/12 IntermediateHashes] Unwinding from=15182421 to=15182420 csbucket=StorageChangeSet
[INFO] [06-07|10:40:05.799] [6/12 IntermediateHashes] Trie root hash=0x4bac6f4a92a8b92902ccc6bc4136349789dcee071de7be00e6e0a675b02e10b0
[INFO] [06-07|10:40:05.822] [4/12 Execution] Unwind Execution from=15182421 to=15182420
[INFO] [06-07|10:40:05.940] RPC Daemon notified of new headers from=15181396 to=15182420 amount=1024 hash=0xf8b051f57016879065ca3d445a71a148aba80881543e30b1e107f06e4ecb68e0 header sending=2.051311ms log sending=450ns
[INFO] [06-07|10:40:05.940] head updated hash=0xf8b051f57016879065ca3d445a71a148aba80881543e30b1e107f06e4ecb68e0 number=15182420
[WARN] [06-07|10:40:06.654] [rpc] served method=engine_forkchoiceUpdatedV3 reqid=2909 t=1.001553327s err="forkChoiceUpdated timeout"
[INFO] [06-07|10:40:15.525] [p2p] GoodPeers
[INFO] [06-07|10:40:15.716] [mem] memory stats Rss=840.2MB Size=0B Pss=840.2MB SharedClean=4.0KB SharedDirty=0B PrivateClean=571.6MB PrivateDirty=268.6MB Referenced=839.6MB Anonymous=269.1MB Swap=0B alloc=114.6MB sys=227.7MB
[INFO] [06-07|10:40:16.731] [ForkChoiceUpdated] duplicate build request
[INFO] [06-07|10:40:16.732] Received GetPayloadV3 payloadId=1
[INFO] [06-07|10:40:16.744] [NewPayload] Handling new payload height=15182421 hash=0xe722b1b3e59325df589e75336b06f88e82626e97f5865d13e816b68f2c5ca73e
[INFO] [06-07|10:40:16.834] [4/12 Execution] Completed on block=15182421
[INFO] [06-07|10:40:17.049] RPC Daemon notified of new headers from=15182420 to=15182421 amount=1 hash=0xe722b1b3e59325df589e75336b06f88e82626e97f5865d13e816b68f2c5ca73e header sending=47.62µs log sending=460ns
[INFO] [06-07|10:40:17.049] head updated hash=0xe722b1b3e59325df589e75336b06f88e82626e97f5865d13e816b68f2c5ca73e number=15182421
[WARN] [06-07|10:40:17.746] [rpc] served method=engine_forkchoiceUpdatedV3 reqid=2913 t=1.00033198s err="forkChoiceUpdated timeout"
[INFO] [06-07|10:40:22.375] [txpool] stat pending=0 baseFee=0 queued=0 alloc=138.7MB sys=227.7MB
op-node
logs
=2024-06-07T10:41:23+0000 lvl=warn msg="did not finish previous block building, starting new building now" prev_onto=0xf8b051f57016879065ca3d445a71a148aba80881543e30b1e107f06e4ecb68e0:15182420 prev_payload_id=0x0000000000000001 new_onto=0xf8b051f57016879065ca3d445a71a148aba80881543e30b1e107f06e4ecb68e0:15182420
t=2024-06-07T10:41:24+0000 lvl=warn msg="Failed to share forkchoice-updated signal" state="&{HeadBlockHash:0xe722b1b3e59325df589e75336b06f88e82626e97f5865d13e816b68f2c5ca73e SafeBlockHash:0x39a28f7e1f8ca4d99fe84c90f833f776f9fe9fa4c4378ce47f6790ac3ed6f9e8 FinalizedBlockHash:0x9a156601b3cbb5e9aaa648e91727359df008cbd33409ad754f13f7de13ed6a4b}" err="forkChoiceUpdated timeout"
t=2024-06-07T10:41:24+0000 lvl=warn msg="Derivation process temporary error" attempts=11 err="engine stage failed: temp: temporarily cannot insert new safe block: failed to complete building on top of L2 chain 0xf8b051f57016879065ca3d445a71a148aba80881543e30b1e107f06e4ecb68e0:15182420, id: 0x0000000000000001, error (1): failed to make the new L2 block canonical via forkchoice: unrecognized rpc error: forkChoiceUpdated timeout"
By the way I tried this on a node which was already broken
@andreclaro @harshsingh-cs We have increased the FCU timeout of op-erigon to the same value of op-node's FCU timeout. Can you try the latest release? https://github.com/testinprod-io/op-erigon/releases/tag/v2.60.0-0.6.2
It helped, the node which was not syncing is now syncing Thanks @ImTei for you help !!
Hi! We tried upgrading to v2.60.0-0.6.2
but now we're getting more of these errors in op-node
:
Jun 10 13:34:15 m-exp-optimism-01 op-node[2854036]: t=2024-06-10T13:34:15+0000 lvl=warn msg="Derivation process temporary error" attempts=16 err="engine stage failed: temp: temporarily cannot insert new safe block: failed to complete building on top of L2 chain 0x4b789cbc82e48a3ca3764b0996dce40746fc85dada9c1cd20eb7edb3cada77c4:121213540, id: 0x0000000000000050, error (1): failed to make the new L2 block canonical via forkchoice: unrecognized rpc error: forkChoiceUpdated timeout"
Jun 10 13:34:15 m-exp-optimism-01 op-node[2854036]: t=2024-06-10T13:34:15+0000 lvl=warn msg="L1 head signal indicates a possible L1 re-org" old_l1_head=0xed7c004a35533f38674f97094f67e8a49a5e040b472ac8d1eff5953a8d39bbd2:20061755 new_l1_head_parent=0xdf7e465da66cca04a29165ca9a030825f878185ac598668c6b0f75b5b028dfc8 new_l1_head=0x2f0668227ad522424676355947956bed55859a2e51f5d1de82bf2ac5fa5a1e12:20061757
Jun 10 13:34:25 m-exp-optimism-01 op-node[2854036]: t=2024-06-10T13:34:25+0000 lvl=warn msg="did not finish previous block building, starting new building now" prev_onto=0x4b789cbc82e48a3ca3764b0996dce40746fc85dada9c1cd20eb7edb3cada77c4:121213540 prev_payload_id=0x0000000000000050 new_onto=0x4b789cbc82e48a3ca3764b0996dce40746fc85dada9c1cd20eb7edb3cada77c4:121213540
Jun 10 13:34:27 m-exp-optimism-01 op-node[2854036]: t=2024-06-10T13:34:27+0000 lvl=warn msg="Failed to share forkchoice-updated signal" state="&{HeadBlockHash:0x95d21089cb18ee5cf5e24e54b156d61733ac48713cc568f1f8eeb3d12c379d88 SafeBlockHash:0xc97983cb4d924edd3ebc23a68c59844ccab3039081f0e03d4e5ac82ce9e38a2b FinalizedBlockHash:0x34ca2f4cc96fbf5ce4c53592cd859c4277199b92d29d69bc976044d9eb231daa}" err="forkChoiceUpdated timeout"
Jun 10 13:34:27 m-exp-optimism-01 op-node[2854036]: t=2024-06-10T13:34:27+0000 lvl=warn msg="Derivation process temporary error" attempts=17 err="engine stage failed: temp: temporarily cannot insert new safe block: failed to complete building on top of L2 chain 0x4b789cbc82e48a3ca3764b0996dce40746fc85dada9c1cd20eb7edb3cada77c4:121213540, id: 0x0000000000000050, error (1): failed to make the new L2 block canonical via forkchoice: unrecognized rpc error: forkChoiceUpdated timeout"
Jun 10 13:34:37 m-exp-optimism-01 op-node[2854036]: t=2024-06-10T13:34:37+0000 lvl=warn msg="did not finish previous block building, starting new building now" prev_onto=0x4b789cbc82e48a3ca3764b0996dce40746fc85dada9c1cd20eb7edb3cada77c4:121213540 prev_payload_id=0x0000000000000050 new_onto=0x4b789cbc82e48a3ca3764b0996dce40746fc85dada9c1cd20eb7edb3cada77c4:121213540
Jun 10 13:34:39 m-exp-optimism-01 op-node[2854036]: t=2024-06-10T13:34:39+0000 lvl=warn msg="Failed to share forkchoice-updated signal" state="&{HeadBlockHash:0x95d21089cb18ee5cf5e24e54b156d61733ac48713cc568f1f8eeb3d12c379d88 SafeBlockHash:0xc97983cb4d924edd3ebc23a68c59844ccab3039081f0e03d4e5ac82ce9e38a2b FinalizedBlockHash:0x34ca2f4cc96fbf5ce4c53592cd859c4277199b92d29d69bc976044d9eb231daa}" err="forkChoiceUpdated timeout"
Jun 10 13:34:39 m-exp-optimism-01 op-node[2854036]: t=2024-06-10T13:34:39+0000 lvl=warn msg="Derivation process temporary error" attempts=18 err="engine stage failed: temp: temporarily cannot insert new safe block: failed to complete building on top of L2 chain 0x4b789cbc82e48a3ca3764b0996dce40746fc85dada9c1cd20eb7edb3cada77c4:121213540, id: 0x0000000000000050, error (1): failed to make the new L2 block canonical via forkchoice: unrecognized rpc error: forkChoiceUpdated timeout"
These are the corresponding logs for op-erigon
:
Jun 10 13:36:45 m-exp-optimism-01 erigon[2853887]: [INFO] [06-10|13:36:45.418] [4/12 Execution] Unwind Execution from=121213541 to=121213540
Jun 10 13:36:45 m-exp-optimism-01 erigon[2853887]: [INFO] [06-10|13:36:45.456] RPC Daemon notified of new headers from=121212516 to=121213540 amount=1024 hash=0x4b789cbc82e48a3ca3764b0996dce40746fc85dada9c1cd20eb7edb3cada77c4 header sending=903.172µs log sending=440ns
Jun 10 13:36:45 m-exp-optimism-01 erigon[2853887]: [INFO] [06-10|13:36:45.457] head updated hash=0x4b789cbc82e48a3ca3764b0996dce40746fc85dada9c1cd20eb7edb3cada77c4 number=121213540
Jun 10 13:36:46 m-exp-optimism-01 erigon[2853887]: [INFO] [06-10|13:36:46.046] [ForkChoiceUpdated] duplicate build request
Jun 10 13:36:46 m-exp-optimism-01 erigon[2853887]: [INFO] [06-10|13:36:46.047] Received GetPayloadV3 payloadId=80
Jun 10 13:36:46 m-exp-optimism-01 erigon[2853887]: [INFO] [06-10|13:36:46.049] [NewPayload] Handling new payload height=121213541 hash=0x95d21089cb18ee5cf5e24e54b156d61733ac48713cc568f1f8eeb3d12c379d88
Jun 10 13:36:46 m-exp-optimism-01 erigon[2853887]: [INFO] [06-10|13:36:46.069] [4/12 Execution] Completed on block=121213541
Jun 10 13:36:46 m-exp-optimism-01 erigon[2853887]: [INFO] [06-10|13:36:46.175] RPC Daemon notified of new headers from=121213540 to=121213541 amount=1 hash=0x95d21089cb18ee5cf5e24e54b156d61733ac48713cc568f1f8eeb3d12c379d88 header sending=20.731µs log sending=250ns
Jun 10 13:36:46 m-exp-optimism-01 erigon[2853887]: [INFO] [06-10|13:36:46.175] head updated hash=0x95d21089cb18ee5cf5e24e54b156d61733ac48713cc568f1f8eeb3d12c379d88 number=121213541
Jun 10 13:36:47 m-exp-optimism-01 erigon[2853887]: [WARN] [06-10|13:36:47.051] [rpc] served conn=127.0.0.1:60396 method=engine_forkchoiceUpdatedV3 reqid=3042 t=1.000694507s err="forkChoiceUpdated timeout"
Jun 10 13:36:55 m-exp-optimism-01 erigon[2853887]: [INFO] [06-10|13:36:55.274] Optimism ForkChoice is choosing to unwind to a previously canonical block blockHash=0x4b789cbc82e48a3ca3764b0996dce40746fc85dada9c1cd20eb7edb3cada77c4 blockNumber=121213540 headHash=0x95d21089cb18ee5cf5e24e54b156d61733ac48713cc568f1f8eeb3d12c379d88
Jun 10 13:36:55 m-exp-optimism-01 erigon[2853887]: [INFO] [06-10|13:36:55.284] [5/12 HashState] Unwinding started from=121213541 to=121213540 storage=false codes=true
Jun 10 13:36:55 m-exp-optimism-01 erigon[2853887]: [INFO] [06-10|13:36:55.284] [5/12 HashState] Unwinding started from=121213541 to=121213540 storage=false codes=false
Jun 10 13:36:55 m-exp-optimism-01 erigon[2853887]: [INFO] [06-10|13:36:55.284] [5/12 HashState] Unwinding started from=121213541 to=121213540 storage=true codes=false
Jun 10 13:36:55 m-exp-optimism-01 erigon[2853887]: [INFO] [06-10|13:36:55.289] [6/12 IntermediateHashes] Unwinding from=121213541 to=121213540 csbucket=AccountChangeSet
Jun 10 13:36:55 m-exp-optimism-01 erigon[2853887]: [INFO] [06-10|13:36:55.289] [6/12 IntermediateHashes] Unwinding from=121213541 to=121213540 csbucket=StorageChangeSet
Jun 10 13:36:55 m-exp-optimism-01 erigon[2853887]: [INFO] [06-10|13:36:55.319] [6/12 IntermediateHashes] Trie root hash=0xbd50e044154d20d6be535cdb4458f1c951f5e9142f6dd9d46da5eef46c5d254e
Jun 10 13:36:55 m-exp-optimism-01 erigon[2853887]: [INFO] [06-10|13:36:55.337] [4/12 Execution] Unwind Execution from=121213541 to=121213540
Jun 10 13:36:55 m-exp-optimism-01 erigon[2853887]: [INFO] [06-10|13:36:55.375] RPC Daemon notified of new headers from=121212516 to=121213540 amount=1024 hash=0x4b789cbc82e48a3ca3764b0996dce40746fc85dada9c1cd20eb7edb3cada77c4 header sending=894.272µs log sending=500ns
Jun 10 13:36:55 m-exp-optimism-01 erigon[2853887]: [INFO] [06-10|13:36:55.375] head updated hash=0x4b789cbc82e48a3ca3764b0996dce40746fc85dada9c1cd20eb7edb3cada77c4 number=121213540
Jun 10 13:36:55 m-exp-optimism-01 erigon[2853887]: [INFO] [06-10|13:36:55.957] [ForkChoiceUpdated] duplicate build request
Jun 10 13:36:55 m-exp-optimism-01 erigon[2853887]: [INFO] [06-10|13:36:55.958] Received GetPayloadV3 payloadId=80
Jun 10 13:36:55 m-exp-optimism-01 erigon[2853887]: [INFO] [06-10|13:36:55.960] [NewPayload] Handling new payload height=121213541 hash=0x95d21089cb18ee5cf5e24e54b156d61733ac48713cc568f1f8eeb3d12c379d88
Jun 10 13:36:55 m-exp-optimism-01 erigon[2853887]: [INFO] [06-10|13:36:55.980] [4/12 Execution] Completed on block=121213541
Jun 10 13:36:56 m-exp-optimism-01 erigon[2853887]: [INFO] [06-10|13:36:56.084] RPC Daemon notified of new headers from=121213540 to=121213541 amount=1 hash=0x95d21089cb18ee5cf5e24e54b156d61733ac48713cc568f1f8eeb3d12c379d88 header sending=18.23µs log sending=231ns
Jun 10 13:36:56 m-exp-optimism-01 erigon[2853887]: [INFO] [06-10|13:36:56.084] head updated hash=0x95d21089cb18ee5cf5e24e54b156d61733ac48713cc568f1f8eeb3d12c379d88 number=121213541
@ncavedale-xlabs Are you using our official docker image? We increased the FCU timeout from 1s to 5s, but your logs said t=1.000694507s err="forkChoiceUpdated timeout"
. Can you check if the right version is running?
My bad, I was still using the old binary (we're compiling from source). The errors are gone now 😄 , but I'll keep monitoring and let you know if I see any odd behaviour. Thanks!
I'm facing the same issue even with the updated fcu timeout of 5!
[WARN] [10-18|06:54:35.014] [rpc] served conn=127.0.0.1:42218 method=engine_forkchoiceUpdatedV3 reqid=127472 t=5.000348083s err="forkChoiceUpdated timeout"
[WARN] [10-18|06:54:35.015] [rpc] served conn=127.0.0.1:47858 method=engine_forkchoiceUpdatedV3 reqid=127473 t=265.816µs err="cannot update forkchoice. execution service is busy"
[WARN] [10-18|06:54:35.041] [rpc] served conn=127.0.0.1:47858 method=engine_forkchoiceUpdatedV3 reqid=127475 t=1.023863ms err="cannot update forkchoice. execution service is busy"
Running:
node: us-docker.pkg.dev/oplabs-tools-artifacts/images/op-node:v1.9.3
operigon: testinprod/op-erigon:2.60.8-0.7.2
What might be the root cause of this delay? My hardware is fine, running on nvme disks with ample IOPS! No CPU/RAM pressure as well.
@cshintov Could you share more logs around the errored line? Setting console.log.verbosity=4
would help as well
I've increased it from 5s
to 30s
and I had added a few more debug logs.. the problem seems to be it's always in execution
.
[WARN] [10-22|07:56:10.618] [yyy] InsertBlocksAndWait: execution status busy while inserting blocks
[INFO] [10-22|07:56:10.719] [yyy] InsertBlocksAndWait response=result:Busy err=nil
[WARN] [10-22|07:56:10.719] [yyy] InsertBlocksAndWait: execution status busy while inserting blocks
[INFO] [10-22|07:56:10.820] [yyy] InsertBlocksAndWait response=result:Busy err=nil
[WARN] [10-22|07:56:10.820] [yyy] InsertBlocksAndWait: execution status busy while inserting blocks
[WARN] [10-22|07:56:10.899] [EngineBlockDownloader] Could not insert headers and bodies err="context deadline exceeded"
[INFO] [10-22|07:56:11.701] [EngineBlockDownloader] Downloading PoS headers... hash=0xb30d8aea84b40d5bd2bed3a7a142f9e404fa17be826519986af7f7f51e9e24c9 requestId=0
[INFO] [10-22|07:56:17.130] [11/12 TxLookup] ETL [2/2] Loading into=BlockTransactionLookup current_prefix=bf08b135
I'm trying to run both op-mainnet and base-mainnet on the same machine as containers.
avg-cpu: %user %nice %system %iowait %steal %idle
0.62 0.00 2.55 2.14 0.00 94.70
Device r/s rMB/s rrqm/s %rrqm r_await rareq-sz w/s wMB/s wrqm/s %wrqm w_await wareq-sz d/s dMB/s drqm/s %drqm d_await dareq-sz aqu-sz %util
loop0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
loop1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
loop10 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
loop11 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
loop12 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
loop2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
loop3 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
loop4 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
loop5 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
loop6 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
loop7 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
loop8 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
loop9 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
md2 17.60 0.30 0.00 0.00 0.27 17.18 217.80 0.85 0.00 0.00 0.09 4.00 0.00 0.00 0.00 0.00 0.00 0.00 0.02 4.56
nvme0n1 1016.60 63.53 0.00 0.00 0.27 63.99 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.27 84.56
nvme1n1 1053.20 65.82 0.00 0.00 0.26 63.99 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.28 84.96
nvme2n1 1043.80 65.22 0.00 0.00 0.26 63.98 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.28 84.56
nvme3n1 1126.40 70.38 0.00 0.00 0.26 63.99 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.29 85.04
nvme4n1 1087.00 67.92 0.00 0.00 0.26 63.99 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.29 84.48
nvme5n1 1012.40 63.26 0.00 0.00 0.27 63.98 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.27 83.52
sda 7.00 0.12 0.00 0.00 0.26 17.14 196.80 0.86 73.40 27.17 0.10 4.46 0.00 0.00 0.00 0.00 0.00 0.00 0.03 4.80
sdb 10.60 0.18 0.00 0.00 0.26 17.21 196.40 0.86 73.40 27.21 0.10 4.46 0.00 0.00 0.00 0.00 0.00 0.00 0.03 4.96
It's read heavy and utilization is around 90% always!
@cshintov Could you please clarify what you mean by increased it from 5s to 30s
? Also please create another github issue so that we can discuss in a new thread.
If the disk utilization is high, then it might mean that the disk does not have enough capacity to handle both op-mainnet
and base-mainnet
simultaneously. Also, please provide more logs, preferably during 30-60 sec timespan.
FCU timeout increased from 1s to 5s earlier. For me it was timing out at 5s. So I increased it to 30s to solve it. That solved this issue.
But I'm still seeing timeouts in other parts. Will share the details.
Summary
Update (2024-05-22): We are also seeing these issues in nodes syncing from genesis as well.
In the past few days, I have been trying with the support an op-erigon developer to spin-up nodes for OP and Base mainnet from snapshots they provide (https://snapshot.testinprod.io/). However the nodes are unable to quickly sync, due to very slow sync, to the point where the node falls further behind the latest block.
The warnings and errors message below are also observed on other nodes syncing from genesis.
System information
Versions:
OS & Version:
Erigon Command (with flags/config):
Consensus Layer: op-node
Consensus Layer Command (with flags/config):
Chain/Network: Optimism Mainnet, Sepolia, Base Mainnet and Sepolia
Expected behaviour
Syncing fast without errors.
Actual behaviour
Getting errors on op-erigon and op-node.
Looking at the source code, logs and OP stack specification here is what it is going on:
[INFO] [05-19|12:21:20.341] [updateForkchoice] Fork choice update: flushing in-memory state (built by previous newPayload)
forkChoiceUpdated timeout
error and forces op-node to retry:[WARN] [05-19|12:21:21.341] [rpc] served conn=127.0.0.1:37458 method=engine_forkchoiceUpdatedV3 reqid=3166 t=1.000392006s err="forkChoiceUpdated timeout"
[WARN] [05-19|12:21:23.555] [rpc] served conn=127.0.0.1:37458 method=engine_forkchoiceUpdatedV3 reqid=3167 t=180.32µs err="[ForkChoiceUpdated]: execution service is busy, cannot assemble blocks"
[WARN] [05-19|12:21:27.806] wrong head block current=0xa1d4ccf32c6d2eb88f9e20761600312198176995a6a538af96d533399ee01cb4 requested=0xbce4dfb53b04b1c369f0b52f00c8d3be128eefbb40f96606c34c8c71e7ec9e73 executionAt=120231966
May 19 12:21:27 m-optimism-02 erigon[1576]: [INFO] [05-19|12:21:27.806] [5/7 HashState] Unwinding started from=120231966 to=120231965 storage=false codes=true
This issue repeats continuously, causing the synchronization from the snapshot to be very very slow, to the point where the node falls further behind the latest block.
As recommended by the op-erigon team, different settings were tried (disable snapshots and enable snap.keepblocks), and the same software versions used on the node where the snapshot was taken (op_erigon v2.55.1-0.4.3 and op-node v1.7.0) were used, but without any improvement observed.
Steps to reproduce the behaviour
The quickest way to reproduce this issue is to use one of the latest snapshots available at https://snapshot.testinprod.io and try to spin-up a new node using op-erigon and op-node
Logs
OP-Erigon:
OP-Node:
Source code related to the logs messages
UpdateForkChoice
function: https://github.com/testinprod-io/op-erigon/blob/cc6aa2f869d3a2aed28f10f6405d04fef25c228a/turbo/execution/eth1/forkchoice.go#L69C35-L97forkChoiceUpdated timeout
: https://github.com/testinprod-io/op-erigon/blob/cc6aa2f869d3a2aed28f10f6405d04fef25c228a/turbo/execution/eth1/forkchoice.go#L85-L87execution service is busy, cannot assemble block
: https://github.com/testinprod-io/op-erigon/blob/cc6aa2f869d3a2aed28f10f6405d04fef25c228a/turbo/engineapi/engine_server.go#L545UnwindOnHistoryV3
function: https://github.com/testinprod-io/op-erigon/blob/cc6aa2f869d3a2aed28f10f6405d04fef25c228a/eth/stagedsync/stage_hashstate.go#L719failed to make the new L2 block canonical via forkchoice
: https://github.com/ethereum-optimism/optimism/blob/40eee4f55f253d17e8c1cebe71e6765174ad7f6d/op-node/rollup/derive/engine_update.go#L190