hyperledger / besu

An enterprise-grade Java-based, Apache 2.0 licensed Ethereum client https://wiki.hyperledger.org/display/besu
https://www.hyperledger.org/projects/besu
Apache License 2.0
1.49k stars 812 forks source link

World State Root does not match expected value -> Error processing block/Failed to import block: Invalid block #4151

Closed howardrcc closed 1 year ago

howardrcc commented 2 years ago

Description

Hi there. I successfully synced to mainnet a few day using snapsync. I am getting Errors now. Here is a snippet of the log (I'm not seeing other usefull messages in TRACE/DEBUG). I'm not sure what is happing. I tried restart and upgrading besu as suggested in other Issues but thats not working for me.

Any suggestions?

2022-07-22 10:52:28.053+00:00 | EthScheduler-Services-3 (importBlock) | ERROR | AbstractBlockProcessor | failed persisting block
java.lang.RuntimeException: World State Root does not match expected value, header 0xe821a58beda94f8ff6b230b1933235f365c52a90a590e09bfda7a89c95991939 calculated 0xe50fb371f71f4d01e127f68c8d8f99b5cd638d3ac4ffb7cf46c9378c9b49de2d
        at org.hyperledger.besu.ethereum.bonsai.BonsaiPersistedWorldState.persist(BonsaiPersistedWorldState.java:255)
        at org.hyperledger.besu.ethereum.mainnet.AbstractBlockProcessor.processBlock(AbstractBlockProcessor.java:201)
        at org.hyperledger.besu.ethereum.mainnet.AbstractBlockProcessor.processBlock(AbstractBlockProcessor.java:42)
        at org.hyperledger.besu.ethereum.mainnet.BlockProcessor.processBlock(BlockProcessor.java:75)
        at org.hyperledger.besu.ethereum.MainnetBlockValidator.processBlock(MainnetBlockValidator.java:158)
        at org.hyperledger.besu.ethereum.MainnetBlockValidator.validateAndProcessBlock(MainnetBlockValidator.java:112)
        at org.hyperledger.besu.ethereum.MainnetBlockValidator.validateAndProcessBlock(MainnetBlockValidator.java:71)
        at org.hyperledger.besu.ethereum.mainnet.MainnetBlockImporter.importBlock(MainnetBlockImporter.java:44)
        at org.hyperledger.besu.ethereum.core.BlockImporter.importBlock(BlockImporter.java:44)
        at org.hyperledger.besu.ethereum.eth.sync.fullsync.FullImportBlockStep.accept(FullImportBlockStep.java:61)
        at org.hyperledger.besu.ethereum.eth.sync.fullsync.FullImportBlockStep.accept(FullImportBlockStep.java:31)
        at org.hyperledger.besu.services.pipeline.CompleterStage.run(CompleterStage.java:37)
        at org.hyperledger.besu.services.pipeline.Pipeline.lambda$runWithErrorHandling$3(Pipeline.java:152)
        at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539)
        at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
        at java.base/java.lang.Thread.run(Thread.java:833)
2022-07-22 10:52:28.054+00:00 | EthScheduler-Services-3 (importBlock) | ERROR | MainnetBlockValidator | Error processing block. Block 15183764 (0x91e647c6c047632c7ca649ab28600f68de9a4e76d58cff4772d77e7b36decb4f)
2022-07-22 10:52:28.060+00:00 | EthScheduler-Services-3 (importBlock) | WARN  | PipelineChainDownloader | Invalid block detected. Disconnecting from sync target. Failed to import block: Invalid block at #15183764 (0x91e647c6c047632c7ca649ab28600f68de9a4e76d58cff4772d77e7b36decb4f)
2022-07-22 10:52:28.061+00:00 | EthScheduler-Services-3 (importBlock) | WARN  | PipelineChainDownloader | Chain download failed. Restarting after short delay.
java.util.concurrent.CompletionException: org.hyperledger.besu.ethereum.eth.sync.tasks.exceptions.InvalidBlockException: Failed to import block: Invalid block at #15183764 (0x91e647c6c047632c7ca649ab28600f68de9a4e76d58cff4772d77e7b36decb4f)
        at java.base/java.util.concurrent.CompletableFuture.encodeRelay(CompletableFuture.java:368)
        at java.base/java.util.concurrent.CompletableFuture.completeRelay(CompletableFuture.java:377)
        at java.base/java.util.concurrent.CompletableFuture$UniRelay.tryFire(CompletableFuture.java:1097)
        at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510)
        at java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2162)
        at org.hyperledger.besu.services.pipeline.Pipeline.abort(Pipeline.java:180)
        at org.hyperledger.besu.services.pipeline.Pipeline.lambda$runWithErrorHandling$3(Pipeline.java:159)
        at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539)
        at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
        at java.base/java.lang.Thread.run(Thread.java:833)
Caused by: org.hyperledger.besu.ethereum.eth.sync.tasks.exceptions.InvalidBlockException: Failed to import block: Invalid block at #15183764 (0x91e647c6c047632c7ca649ab28600f68de9a4e76d58cff4772d77e7b36decb4f)
        at org.hyperledger.besu.ethereum.eth.sync.fullsync.FullImportBlockStep.accept(FullImportBlockStep.java:62)
        at org.hyperledger.besu.ethereum.eth.sync.fullsync.FullImportBlockStep.accept(FullImportBlockStep.java:31)
        at org.hyperledger.besu.services.pipeline.CompleterStage.run(CompleterStage.java:37)
        at org.hyperledger.besu.services.pipeline.Pipeline.lambda$runWithErrorHandling$3(Pipeline.java:152)
        ... 5 more
2022-07-22 10:52:58.913+00:00 | Timer-0 | INFO  | DNSTimerTask | Refreshing DNS records with enrtree://AKA3AM6LPBYEUDMVNU3BSVQJ5AD45Y7YPOHJLEF6W26QOE4VTUDPE@all.mainnet.ethdisco.net
2022-07-22 10:53:58.913+00:00 | Timer-0 | INFO  | DNSTimerTask | Refreshing DNS records with enrtree://AKA3AM6LPBYEUDMVNU3BSVQJ5AD45Y7YPOHJLEF6W26QOE4VTUDPE@all.mainnet.ethdisco.net
2022-07-22 10:54:58.913+00:00 | Timer-0 | INFO  | DNSTimerTask | Refreshing DNS records with enrtree://AKA3AM6LPBYEUDMVNU3BSVQJ5AD45Y7YPOHJLEF6W26QOE4VTUDPE@all.mainnet.ethdisco.net
2022-07-22 10:55:58.913+00:00 | Timer-0 | INFO  | DNSTimerTask | Refreshing DNS records with enrtree://AKA3AM6LPBYEUDMVNU3BSVQJ5AD45Y7YPOHJLEF6W26QOE4VTUDPE@all.mainnet.ethdisco.net

Versions (Add all that apply)

ronaldjmaas commented 2 years ago

I experienced same issue with 22.4.4. There are some fixes to mitigate this issue included in the 22.7.0-rc2 release of Besu. I discarded the database and retesting the sync from scratch using 22.7.0-rc2. Note when you see the 'World State Root does not match expected value' exception, your database is corrupted and cannot be recovered.

Hope you can do the same. The more testing the better.

howardrcc commented 2 years ago

Thanks for the tips. I tried 22.7.0-snapshot (latest) and the db got corrupted again.... Gonna try -rc2 now. funny thing it kind of synced (somewhere until 15.200.000) . Then I saw number of blocks behind going up again (in grafana). Restarted besu then the number of blocks starting to decrease again. 15 minutes later i get the world state root not match error. I think that at some point it switched from snap sync to full sync in the logs. Don't know if that's a clue but I'll guess we'll know more soon.

after this I'm switching to fastsync :-)

jflo commented 2 years ago

@howardrcc Did you delete your data fir and resync from empty with 22.7? That would be required.

howardrcc commented 2 years ago

I did delete the data dir and also deleted after trying the (non-rc2) 22.7.0 snapshot. It's still syncing for a day or two I think. I'll report back in a few days.

howardrcc commented 2 years ago

another fail/corrupted when database was more or less synced. Found out the db got corrupted while grafana reporting it was running about 12000 blocks behind. So it's seems to be happening when all blocks are imported and when new blocks/ the worldstate is being calculated or retrieved.

garyschulte commented 1 year ago

closing as fixed by #4409

JanKalin commented 9 months ago

I would like to reopen this issue. It cropped up today at around 12:29. Besu v23.10.2 running on a Debian machine, 32GB RAM, 536GB free out of 1.8T SSD, validating for about a year without problems. To free up space I have resynced with the checkpoint option (see the attached sync.log) sync.log

Everything was going fine until today when I got missed attestation messages and checked the PC. This is the log from today (with quite a few lines delete, since the file would otherwise be too big) besu.log

I will now reboot the machine and, if it does not work, will delete the Besu data and resync. Will keep you updated.

This is the besu.service file:

[Unit]
Description=Besu Execution Client (Mainnet)
Wants=network-online.target
After=network-online.target

[Service]
User=eth1
Group=eth1
Type=simple
Restart=always
RestartSec=5
TimeoutStopSec=180
Environment="JAVA_OPTS=-Xmx8g"
ExecStart=/srv/besu/besu/bin/besu \
  --network=mainnet \
  --sync-mode=X_CHECKPOINT  \
  --data-path=/mnt/rpdata/besu_data \
  --data-storage-format=BONSAI \
  --engine-jwt-secret=/secrets/jwtsecret \
  --rpc-http-enabled=true \
  --rpc-http-port=8545 \
  --max-peers=40 \
  --metrics-enabled \
  --Xplugin-rocksdb-high-spec-enabled

[Install]
WantedBy=multi-user.target

UPDATE1:

Reboot does not fix it. post_reboot.log

MikeJerred commented 9 months ago

I also got this error at around 11:30 GMT today. Reboot did not fix

JanKalin commented 9 months ago

I forgot to mention that my localtime is CET, thus GMT+1. So apparently it happened more or less at the same time. Which was your last successfully synced block? Mine is apparently 18,947,892

Jan 06 12:29:13 rp besu[2257214]: 2024-01-06 12:29:13.134+01:00 | vert.x-worker-thread-0 | INFO  | AbstractEngineNewPayload | Imported #18,947,891 / 125 tx / 16 ws / base fee 12.21 gwei / 11,071,834 (36.9%) gas / (0xa
9ddaf079c9b5ed5baece459d7b7e2866983566bbb08f92c3134798eb0e67597) in 0.333s. Peers: 40
Jan 06 12:29:26 rp besu[2257214]: 2024-01-06 12:29:26.246+01:00 | vert.x-worker-thread-0 | INFO  | AbstractEngineNewPayload | Imported #18,947,892 / 187 tx / 16 ws / base fee 11.81 gwei / 17,562,696 (58.5%) gas / (0xc
1141d53490d3b8ba41c88ef6db4256fbddc77db8cbb03afaeb5b94e52ee42f3) in 0.496s. Peers: 40
Jan 06 12:29:36 rp besu[2257214]: 2024-01-06 12:29:36.735+01:00 | vert.x-worker-thread-0 | ERROR | AbstractBlockProcessor | failed persisting block
Jan 06 12:29:36 rp besu[2257214]: java.lang.RuntimeException: World State Root does not match expected value, header 0xf9029a6ce0a53e912643642e3458967dd2e38edd60d77e312156d8b1c432a433 calculated 0xf26bfa5c260e327582633c0c77d8dbe900a4877ab57e067ec814acd81d4b98ba
MikeJerred commented 9 months ago

Same block for me. Devs in the discord say they are working on it.

Invalid block 18947893 (0xb41fc83658a61504771fa9904d67f89b57c531cb902335358ff94c0680f05f07): Optional[World State Root does not match expected value, header 0xf9029a6ce0a53e912643642e3458967dd2e38edd60d77e312156d8b1c432a433 calculated 0xf26bfa5c260e327582633c0c77d8dbe900a4877ab57e067ec814acd81d4b98ba]
dhigby commented 9 months ago

I have the same issue and at the same time as @JanKalin . Just started this morning, but maybe it has been going on longer until I ran out of free credits on infura backup. I'm not as tech savvy, but I can post the same info JanKalin did above.

besu service file:

[Unit]
Description=Besu Execution Client (Mainnet)
Wants=network-online.target
After=network-online.target
[Service]
User=besu
Group=besu
Type=simple
Restart=always
RestartSec=5
ExecStart=/usr/local/bin/besu/bin/besu \
  --network=mainnet \
  --sync-mode=X_SNAP \
  --data-path=/var/lib/besu \
  --data-storage-format=BONSAI \
  --engine-jwt-secret=/var/lib/jwtsecret/jwt.hex \
  --metrics-enabled
[Install]
WantedBy=multi-user.target

And... I'll wait to supply more info since a lot of people seem to have the same problem.

sfbahr commented 9 months ago

More discussion of the recent issue in https://github.com/hyperledger/besu/issues/6357

matkt commented 9 months ago

Could you try to upgrade to the version 23.10.3 ?

dhigby commented 9 months ago

I upgraded from 23.10.1 to pre-release 23.10.3 and am currently doing the backward sync. We'll see if it completes.

dhigby commented 9 months ago

Backward sync worked! besu is up and running after the upgrade to 23.10.3

matkt commented 9 months ago

I recommend to follow this https://discord.com/channels/905194001349627914/938504958909747250/1193223408058630295 for people having the problem

NessDan commented 9 months ago

I recommend to follow this https://discord.com/channels/905194001349627914/938504958909747250/1193223408058630295 for people having the problem

This link doesn't go anywhere, I assume because I'm not in the server it was posted? Can you share whatever is in there here in the Github discussion?

Found the solution I think in the Ethstaker Discord. (Link to message.)

Besu world state issue recap.

Around 2024-01-06T11:29:36 UTC, Besu started reporting errors like this one: World State Root does not match expected value, header 0xf9029a6ce0a53e912643642e3458967dd2e38edd60d77e312156d8b1c432a433 calculated 0xf26bfa5c260e327582633c0c77d8dbe900a4877ab57e067ec814acd81d4b98ba followed by many Invalid new payload messages, with the effect that Besu is not in sync and the CL client is stuck too and not able to publish attestations or blocks.

The cause of this issue are still under investigation, but after collecting feedback from user and testing some options, there are some workarounds to recover you node, until a proper fix is released.

Recovering options: If you are still running Besu with version 23.10.2 or lower, then upgrade to 23.10.3 https://github.com/hyperledger/besu/releases/tag/untagged-8a32de7a7280bb003d79 If you are already on version 23.10.3, then keep Besu running, and operate on your CL client, removing its beacon db and restarting, this will trigger a backward sync in Besu that could help healing the worldstate. Instructions on how to delete the beacon db, depends on your client, for example for Teku you need to remove the beacon folder in the Teku data path, for other client refer to their documentation. If after the backward sync session the issue is still there then try point 3. If that fails, try the same process again, but force Besu to set its head back to a block prior to the problem block. This will require API access, and for this to work you have to enabled the DEBUG API:

--rpc-http-api=ETH,NET,WEB3,DEBUG

see https://besu.hyperledger.org/public-networks/reference/cli/options#rpc-http-api for more details, changes to that param will require Besu to be restarted.

Once Besu is up and you have access to DEBUG apis, you can shutdown your CL client and clear its DB. With Besu still up, set its head back to block 18947892 (0x1211F34 in hex) using the following API call: curl -X POST --data '{"jsonrpc":"2.0","method":"debug_setHead","params":["0x1211F34"],"id":1}' http://localhost:8545 Then restart your CL. Your CL should optimistically sync the beacon chain using a checkpoint from the network, and this will trigger a backward sync in Besu.

If after the backward sync session the issue is still there then try point 4.

If previous options have not worked then, you can try to resync only the world state, it can takes some hours, but it is faster from a resync from scratch curl -X POST --data '{"jsonrpc":"2.0","method":"debug_resyncWorldState","params":[],"id":1}' http://localhost:8545

image

matkt commented 9 months ago

We have just created a hotfix. Even if your node works well it is better to update Besu to avoid future issues. If your node already works well, no need to do anything else, just an update and use hotfix

JanKalin commented 8 months ago

Thanks to devs for quick action! BTW, I can attest (pun intended) that a full resync of the chain works with v23.10.3, but will update to hotfix version.