hyperledger / besu

An enterprise-grade Java-based, Apache 2.0 licensed Ethereum client https://wiki.hyperledger.org/display/besu
https://www.hyperledger.org/projects/besu
Apache License 2.0
1.48k stars 804 forks source link

Besu Fails Repeatedly During Initial Full Sync - "World State Root does not match expected value" #6615

Closed kma67 closed 1 month ago

kma67 commented 6 months ago

Greetings!

I am running into an issue with multiple versions of Besu failing at the same block (11,114,732) when trying to do an initial sync for a full archive node, limited details follow:

  1. Running slightly older Besu and Teku versions, error at block 11,114,732 occurs

  2. Update Besu to 24.1.2-dev-b6a6402b (and Teku to teku/v24.1.1+44-ge6c81905a6/linux-x86_64/-privatebuild-openjdk64bitservervm-java-19), start Besu (and Teku), error at block 11,114,732 occurs

  3. Restored Besu/Teku data from previous backup (~around block 5,000,000), start Besu (and Teku) and let run, error at block 11,114,732 occurs

  4. Deleted Besu/Teku data, start Besu (and Teku), error at block 11,114,732 occurs

*** Command line:

!/bin/bash

/usr/local/java/besu/besu-build/bin/besu \ --sync-mode=FULL \ --data-storage-format=BONSAI \ --data-path="/mnt/md0/data/ethereum_data-besu" \ --max-peers=96 \ --rpc-http-api=ETH,NET,WEB3,DEBUG \ --rpc-http-enabled=true \ --rpc-http-host="0.0.0.0" \ --rpc-http-cors-origins="" \ --rpc-http-max-batch-size=10000 \ --rpc-ws-enabled=true \ --rpc-ws-host="0.0.0.0" \ --host-allowlist="" \ --engine-rpc-port=8551 \ --engine-jwt-secret=/usr/local/java/jwt-secret/jwtsecret.hex

*** O/S Info:

Distributor ID: Ubuntu Description: Ubuntu 23.10 Release: 23.10 Codename: mantic

*** Besu Info:

Besu version: 24.1.2-dev-b6a6402b

Configuration: Network: Mainnet Network Id: 1 Data storage: Bonsai Sync mode: Full RPC HTTP APIs: ETH,NET,WEB3,DEBUG RPC HTTP port: 8545 Engine APIs: ENGINE,ETH Engine port: 8551 Engine JWT: /usr/local/java/jwt-secret/jwtsecret.hex Using LAYERED transaction pool implementation Using STACKED worldstate update mode

Host: Java: openjdk-java-19 Maximum heap size: 15.64 GB OS: linux-x86_64 glibc: 2.38 Total memory: 62.53 GB CPU cores: 24

Plugins: TOTAL = 0 of 0 plugins successfully loaded from /usr/local/java/besu/besu-build/plugins

*** Most recent stacktrace:

2024-02-27 01:55:32.608+00:00 | EthScheduler-Services-9 (importBlock) | ERROR | AbstractBlockProcessor | failed persisting block java.lang.RuntimeException: World State Root does not match expected value, header 0x5b4efb7aa73344c4817fb7d6d916b185ef3356237ce2e67c4efebecde6c3951e calculated 0xaae985d0795a47a0d69029ace31ff69755742d272074508ad981a6c5f1d61535 at org.hyperledger.besu.ethereum.trie.bonsai.worldview.BonsaiWorldState.verifyWorldStateRoot(BonsaiWorldState.java:476) at org.hyperledger.besu.ethereum.trie.bonsai.worldview.BonsaiWorldState.persist(BonsaiWorldState.java:434) at org.hyperledger.besu.ethereum.mainnet.AbstractBlockProcessor.processBlock(AbstractBlockProcessor.java:194) at org.hyperledger.besu.ethereum.mainnet.BlockProcessor.processBlock(BlockProcessor.java:79) at org.hyperledger.besu.ethereum.MainnetBlockValidator.processBlock(MainnetBlockValidator.java:207) at org.hyperledger.besu.ethereum.MainnetBlockValidator.validateAndProcessBlock(MainnetBlockValidator.java:135) at org.hyperledger.besu.ethereum.MainnetBlockValidator.validateAndProcessBlock(MainnetBlockValidator.java:72) at org.hyperledger.besu.ethereum.mainnet.MainnetBlockImporter.importBlock(MainnetBlockImporter.java:45) at org.hyperledger.besu.ethereum.core.BlockImporter.importBlock(BlockImporter.java:45) at org.hyperledger.besu.ethereum.eth.sync.fullsync.FullImportBlockStep.accept(FullImportBlockStep.java:63) at org.hyperledger.besu.ethereum.eth.sync.fullsync.FullImportBlockStep.accept(FullImportBlockStep.java:32) at org.hyperledger.besu.services.pipeline.CompleterStage.run(CompleterStage.java:37) at org.hyperledger.besu.services.pipeline.Pipeline.lambda$runWithErrorHandling$3(Pipeline.java:170) at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:577) at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:317) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) at java.base/java.lang.Thread.run(Thread.java:1589) 2024-02-27 01:55:32.608+00:00 | EthScheduler-Services-9 (importBlock) | INFO | MainnetBlockValidator | Invalid block 11114732 (0x878633f621aa8e89837ee8160d3acff74e7765d05df2e99bbcff11c8659e5fca): Optional[World State Root does not match expected value, header 0x5b4efb7aa73344c4817fb7d6d916b185ef3356237ce2e67c4efebecde6c3951e calculated 0xaae985d0795a47a0d69029ace31ff69755742d272074508ad981a6c5f1d61535], caused by java.lang.RuntimeException: World State Root does not match expected value, header 0x5b4efb7aa73344c4817fb7d6d916b185ef3356237ce2e67c4efebecde6c3951e calculated 0xaae985d0795a47a0d69029ace31ff69755742d272074508ad981a6c5f1d61535 2024-02-27 01:55:32.608+00:00 | EthScheduler-Services-9 (importBlock) | INFO | Pipeline | Unexpected exception in pipeline. Aborting. Throwable summary: org.hyperledger.besu.ethereum.eth.sync.tasks.exceptions.InvalidBlockException: Failed to import block: Invalid block at #11114732 (0x878633f621aa8e89837ee8160d3acff74e7765d05df2e99bbcff11c8659e5fca) at: org.hyperledger.besu.ethereum.eth.sync.fullsync.FullImportBlockStep.accept(FullImportBlockStep.java:65) 2024-02-27 01:55:32.608+00:00 | EthScheduler-Services-9 (importBlock) | WARN | PipelineChainDownloader | Invalid block detected (BREACH_OF_PROTOCOL). Disconnecting from sync target. Failed to import block: Invalid block at #11114732 (0x878633f621aa8e89837ee8160d3acff74e7765d05df2e99bbcff11c8659e5fca) 2024-02-27 01:55:32.608+00:00 | EthScheduler-Services-9 (importBlock) | WARN | PipelineChainDownloader | Chain download failed. Restarting after short delay. java.util.concurrent.CompletionException: org.hyperledger.besu.ethereum.eth.sync.tasks.exceptions.InvalidBlockException: Failed to import block: Invalid block at #11114732 (0x878633f621aa8e89837ee8160d3acff74e7765d05df2e99bbcff11c8659e5fca) at java.base/java.util.concurrent.CompletableFuture.encodeRelay(CompletableFuture.java:368) at java.base/java.util.concurrent.CompletableFuture.completeRelay(CompletableFuture.java:377) at java.base/java.util.concurrent.CompletableFuture$UniRelay.tryFire(CompletableFuture.java:1097) at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510) at java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2194) at org.hyperledger.besu.services.pipeline.Pipeline.abort(Pipeline.java:207) at org.hyperledger.besu.services.pipeline.Pipeline.lambda$runWithErrorHandling$3(Pipeline.java:186) at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:577) at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:317) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) at java.base/java.lang.Thread.run(Thread.java:1589) Caused by: org.hyperledger.besu.ethereum.eth.sync.tasks.exceptions.InvalidBlockException: Failed to import block: Invalid block at #11114732 (0x878633f621aa8e89837ee8160d3acff74e7765d05df2e99bbcff11c8659e5fca) at org.hyperledger.besu.ethereum.eth.sync.fullsync.FullImportBlockStep.accept(FullImportBlockStep.java:65) at org.hyperledger.besu.ethereum.eth.sync.fullsync.FullImportBlockStep.accept(FullImportBlockStep.java:32) at org.hyperledger.besu.services.pipeline.CompleterStage.run(CompleterStage.java:37) at org.hyperledger.besu.services.pipeline.Pipeline.lambda$runWithErrorHandling$3(Pipeline.java:170) ... 5 more

...above error repeats over and over and over and over...

Please let me know if you need any additional information, apologies if this is a duplicate issue but the solutions offered in other tickets with a similar description did not work, perhaps due to my node still doing the full import?

macfarla commented 6 months ago

Thanks for the report @kma67 - the block, hash and expected state root matches etherscan so it looks like you might have hit an edge case with bonsai.

Since you have besu's DEBUG api enabled, you should be able to run debug_getBadBlocks https://besu.hyperledger.org/public-networks/reference/api#debug_getbadblocks

something like curl -X POST --data '{"jsonrpc":"2.0","method":"debug_getBadBlocks","params":[],"id":1}' http://127.0.0.1:8545

and it will spit out details on the blocks besu thinks are bad which might help us debug.

kma67 commented 6 months ago

Many thanks for this, I am adding the returned output as a textfile! badblockinfo.txt

macfarla commented 6 months ago

@kma67 there's actually no real benefit to running bonsai for a full sync node, since you won't be able to access the historical blocks. Is there a particular reason you're doing this combo?

If you change from BONSAI to FOREST, you'll be able to access the historical state which is probably what you want if you're going for an archive node. You will need to resync from scratch though since the database is a different format.

kma67 commented 6 months ago

@macfarla Thanks again for the reply! This blurb at https://besu.hyperledger.org/23.4.0/public-networks/concepts/data-storage-formats was the reason why I chose Bonsai over Forest - Storage space requirements (this documentation seemed to suggest that Bonsai is a better choice for an archive node):

"Forest of Tries vs. Bonsai Tries

Storage requirements

Forest mode uses significantly more memory than Bonsai. With an archive node, forest mode uses an estimated 12 TB of storage, while Bonsai uses an estimated 1100 GB of storage. With a full node, forest mode uses an estimated 750 GB of storage, while Bonsai uses an estimated 650 GB of storage."

I may have misinterpreted what I read above, should I NOT use Bonsai because it will not provide access to historical state?

Edit: I noticed the bonsai-historical-block-limit parameter, default value of 512, and was thinking I might be able to set it to something large enough to encompass the entire block data history although I have no idea what this value would/should be at the moment. Do you have any thoughts on this approach?

macfarla commented 6 months ago

I can see where you're coming from - think we can make the docs clearer on this - but in the Accessing Data section

Bonsai becomes increasingly more resource-intensive the further in history you try to read data. To prevent this, you can limit how far Bonsai looks back while reconstructing data. The default limit Bonsai looks back is 512.

you can make the limit more than 512 but with an archive node you're talking orders of magnitude more than that which bonsai is not designed for. So maybe the question is - do you need an archive node? What's your use case?

kma67 commented 6 months ago

@macfarla Many thanks yet again, aye, we are on the same page regarding this and perhaps it simply isn't doable using Bonsai. My use case is basically to provide a datasource for indexers that consume blockchain data for use in various intelligence tools. I originally started with geth but immedaitely began looking for a solution that was more efficient with regard to data storage. Ultimately, the powers-that-be stated they need an archive node.

kma67 commented 6 months ago

@macfarla Regardless of the above, is my issue caused by a bug that should perhaps be resolved or is it one of those things that will ultimately just fail for me and others who attempt to run a full/archive node using Bonsai going forward?

macfarla commented 6 months ago

yeah it is a bug. but I think it won't be highest priority for us to fix since it's not a recommended combination of options.

storage-wise - we are working on a few optimisations, so hopefully things will improve.

Additionally, I've raised a ticket on the docs repo to make archive vs full and where bonsai makes sense

macfarla commented 6 months ago

have you looked into the portal network https://www.ethportal.net/ - there isn't a besu client there yet though

kma67 commented 6 months ago

@macfarla I will take a look at the portal network, most appreciated! Also, many thanks for your assistance and being honest about the priority, I will investigate running in Forest mode and see how it goes.

matkt commented 6 months ago

Can you keep your node db to help us débug ?

Are you sure to use the latest version of BESU for step 4? Since the start of the sync and not only at the end

kma67 commented 6 months ago

Greetings @matkt, surely! As for the sync, it was run from scratch using the version referenced above (24.1.2-dev-b6a6402b).

matkt commented 6 months ago

I do not know if you have a way to share your database in order to save us 3 weeks of sync to return to this block. We can discuss it on Discord if you have a possibility

kma67 commented 6 months ago

@matkt There should be a way to do this, let's chat about it!

non-fungible-nelson commented 6 months ago

Let us know how we can get this info from you regarding the bad blocks (if you are unable to connect with @matkt). discord.gg/hyperledger #besu-contruibutors

kma67 commented 6 months ago

@non-fungible-nelson Greetings Matt, I just dropped a note to @matkt on #besu-contributors

kma67 commented 6 months ago

Updating with current info: Posted database archive for download, data was downloaded earlier today by @matkt, waiting to hear if downloaded data is satisfactory or if I need to provide an earlier backup of the database.

matkt commented 6 months ago

working on that currently thanks to the database shared

binny1024 commented 4 months ago

2024-05-15 15:34:04.025+08:00 | main | ERROR | Besu | Failed to start Besu picocli.CommandLine$ExecutionException: World State Root does not match expected value, header 0xba4e08cea46e04a1e5ee846ea067f6ac570b17ab9020f7133b9fef51f9f352e1 calculated 0x3c5ffd7ec2bc13ff5c3bef1d0646e042eb18b91f0509fa1c942c89f472383985 at org.hyperledger.besu.cli.BesuCommand.buildController(BesuCommand.java:1828) at org.hyperledger.besu.cli.BesuCommand.run(BesuCommand.java:1158) at picocli.CommandLine.executeUserObject(CommandLine.java:2026) at picocli.CommandLine.access$1500(CommandLine.java:148) at picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2461) at picocli.CommandLine$RunLast.handle(CommandLine.java:2453) at picocli.CommandLine$RunLast.handle(CommandLine.java:2415) at picocli.CommandLine$AbstractParseResultHandler.execute(CommandLine.java:2273) at picocli.CommandLine$RunLast.execute(CommandLine.java:2417) at picocli.CommandLine.execute(CommandLine.java:2170) at org.hyperledger.besu.cli.BesuCommand.lambda$createExecuteTask$1(BesuCommand.java:1078) at picocli.CommandLine.execute(CommandLine.java:2170) at org.hyperledger.besu.cli.BesuCommand.lambda$createPluginRegistrationTask$2(BesuCommand.java:1088) at picocli.CommandLine.execute(CommandLine.java:2170) at org.hyperledger.besu.cli.util.ConfigDefaultValueProviderStrategy.execute(ConfigDefaultValueProviderStrategy.java:58) at picocli.CommandLine.execute(CommandLine.java:2170) at org.hyperledger.besu.cli.BesuCommand.executeCommandLine(BesuCommand.java:1115) at org.hyperledger.besu.cli.BesuCommand.parse(BesuCommand.java:1057) at org.hyperledger.besu.Besu.main(Besu.java:39) Caused by: java.lang.RuntimeException: World State Root does not match expected value, header 0xba4e08cea46e04a1e5ee846ea067f6ac570b17ab9020f7133b9fef51f9f352e1 calculated 0x3c5ffd7ec2bc13ff5c3bef1d0646e042eb18b91f0509fa1c942c89f472383985 at org.hyperledger.besu.ethereum.trie.diffbased.common.worldview.DiffBasedWorldState.verifyWorldStateRoot(DiffBasedWorldState.java:203) at org.hyperledger.besu.ethereum.trie.diffbased.common.worldview.DiffBasedWorldState.persist(DiffBasedWorldState.java:161) at org.hyperledger.besu.ethereum.chain.GenesisState.writeAccountsTo(GenesisState.java:187) at org.hyperledger.besu.ethereum.chain.GenesisState.writeStateTo(GenesisState.java:170) at org.hyperledger.besu.controller.BesuControllerBuilder.build(BesuControllerBuilder.java:616) at org.hyperledger.besu.cli.BesuCommand.buildController(BesuCommand.java:1826)

matkt commented 1 month ago

there is now a fix for this one. thanks for your help.