paradigmxyz / reth

Modular, contributor-friendly and blazing-fast implementation of the Ethereum protocol, in Rust
https://reth.rs/
Apache License 2.0
3.9k stars 1.14k forks source link

Start failed on 1.0.4 (e24e4c77): invalid value: string "full". #10199

Closed lyfsn closed 1 month ago

lyfsn commented 2 months ago

Describe the bug

I started a new node in a custom network to sync data from block 0 to the latest block, and encountered this error. The Docker container keeps trying to restart after some retry logs.

In the first phase, the node shows this error in the 13/14 stage:

2024-08-08T02:50:44.138154Z  INFO Preparing stage pipeline_stages=13/14 stage=Prune checkpoint=0 target=1122938
2024-08-08T02:50:44.138166Z  INFO Executing stage pipeline_stages=13/14 stage=Prune checkpoint=0 target=1122938
2024-08-08T02:50:44.138203Z  WARN Stage encountered a non-fatal error: the configuration provided for Receipts is invalid. Retrying... stage=Prune
2024-08-08T02:50:44.138227Z  INFO Preparing stage pipeline_stages=13/14 stage=Prune checkpoint=0 target=1122938
2024-08-08T02:50:44.138241Z  INFO Executing stage pipeline_stages=13/14 stage=Prune checkpoint=0 target=1122938
2024-08-08T02:50:44.138269Z  WARN Stage encountered a non-fatal error: the configuration provided for Receipts is invalid. Retrying... stage=Prune

In the second phase, the node still restarts with this error:

2024-08-08T02:14:54.991860Z  INFO Initialized tracing, debug log directory: /root/.cache/reth/logs/648
2024-08-08T02:14:54.993260Z  INFO Starting reth version="1.0.4 (e24e4c77)"
2024-08-08T02:14:54.993939Z  INFO Opening database path="/execution-data/db"
2024-08-08T02:14:55.021434Z ERROR shutting down due to error
Error: Could not load config file "/execution-data/reth.toml"

Caused by:
   0: Bad TOML data
   1: TOML parse error at line 55, column 12
   1:    |
   1: 55 | receipts = "full"
   1:    |            ^^^^^^
   1: invalid value: string "full", expected prune mode that leaves at least 10064 blocks in the database

Location:
    crates/node/builder/src/launch/common.rs:119:14

Steps to reproduce

  1. In a custom network
  2. Start a new node to sync block

Node logs

No response

Platform(s)

Linux (x86)

What version/commit are you on?

reth version="1.0.4 (e24e4c77)"

What database version are you on?

use default

Which chain / network are you on?

custom chain

What type of node are you running?

Archive (default)

What prune config do you use, if any?

services:
  execution:
    image: ghcr.io/paradigmxyz/reth:latest
    pull_policy: always
    command:
      - node
      - -vvv
      - --full
      - --datadir=/execution-data
      - --chain=/network_config/genesis.json
      - --addr=0.0.0.0
      - --port=30303
      - --discovery.port=30303
      - --discovery.addr=0.0.0.0
      - --http
      - --http.port=8545
      - --http.addr=0.0.0.0
      - --http.corsdomain=*
      - --http.api=net,eth,web3,txpool
      - --ws
      - --ws.addr=0.0.0.0
      - --ws.port=8546
      - --ws.api=net,eth
      - --ws.origins=*
      - --nat=extip:${IP_ADDRESS}
      - --authrpc.port=8551
      - --authrpc.jwtsecret=/jwtsecret
      - --authrpc.addr=0.0.0.0
      - --metrics=0.0.0.0:9001
      - --bootnodes=${EL_BOOTNODES}
      - --trusted-peers=${EL_BOOTNODES}

If you've built Reth from source, provide the full command you used

No response

Code of Conduct

mattsse commented 2 months ago

could you also post your reth.toml?

@joshieDo looks like this is thrown here

https://github.com/paradigmxyz/reth/blob/e907f0eab0d90f286f2197ee8935f10817ff1edf/crates/prune/types/src/mode.rs#L46-L46

lyfsn commented 2 months ago

This reth.toml was auto-generated by the reth node and appeared in my --datadir path:

[stages.headers]
downloader_max_concurrent_requests = 100
downloader_min_concurrent_requests = 5
downloader_max_buffered_responses = 100
downloader_request_limit = 1000
commit_threshold = 10000

[stages.bodies]
downloader_request_limit = 200
downloader_stream_batch_size = 1000
downloader_max_buffered_blocks_size_bytes = 2147483648
downloader_min_concurrent_requests = 5
downloader_max_concurrent_requests = 100

[stages.sender_recovery]
commit_threshold = 5000000

[stages.execution]
max_blocks = 500000
max_changes = 5000000
max_cumulative_gas = 1500000000000
max_duration = "10m"

[stages.prune]
commit_threshold = 1000000

[stages.account_hashing]
clean_threshold = 500000
commit_threshold = 100000

[stages.storage_hashing]
clean_threshold = 500000
commit_threshold = 100000

[stages.merkle]
clean_threshold = 5000

[stages.transaction_lookup]
chunk_size = 5000000

[stages.index_account_history]
commit_threshold = 100000

[stages.index_storage_history]
commit_threshold = 100000

[stages.etl]
file_size = 524288000

[prune]
block_interval = 5

[prune.segments]
sender_recovery = "full"
receipts = "full"

[prune.segments.account_history]
distance = 10064

[prune.segments.storage_history]
distance = 10064

[prune.segments.receipts_log_filter]

[peers]
refill_slots_interval = "5s"
trusted_nodes = []
trusted_nodes_only = false
max_backoff_count = 5
ban_duration = "12h"

[peers.connection_info]
max_outbound = 100
max_inbound = 30
max_concurrent_outbound_dials = 15

[peers.reputation_weights]
bad_message = -16384
bad_block = -16384
bad_transactions = -16384
already_seen_transactions = 0
timeout = -4096
bad_protocol = -2147483648
failed_to_connect = -25600
dropped = -4096
bad_announcement = -1024

[peers.backoff_durations]
low = "30s"
medium = "3m"
high = "15m"
max = "1h"

[sessions]
session_command_buffer = 32
session_event_buffer = 260

[sessions.limits]

[sessions.initial_internal_request_timeout]
secs = 20
nanos = 0

[sessions.protocol_breach_request_timeout]
secs = 120
nanos = 0

[sessions.pending_session_timeout]
secs = 20
nanos = 0
mattsse commented 2 months ago

receipts = "full"

did ou manually edit this?

can't reproduce this entry on a new run with --full

lyfsn commented 2 months ago

@mattsse

Hey, this is a script to start a Reth+Lighthouse node on the Endurance network.

I encountered this problem when using this script, and I can reproduce it every time I use this script to start a new node.

So if you can't reproduce the problem, could you please try this script to start a node?

https://github.com/OpenFusionist/mainnet-reth-lighthouse

This problem didn't appear in the beta version before.

lyfsn commented 2 months ago

When I use v1.0.0 here, everything is ok.

    image: ghcr.io/paradigmxyz/reth:v1.0.0

https://github.com/OpenFusionist/mainnet-reth-lighthouse/blob/main/compose.yaml#L5

But when I use latest, this problem appears.

lyfsn commented 2 months ago

To be more precise, version 1.0.3 is fine, but v1.0.4 has issues.

zhwrd commented 1 month ago

I'm also seeing this issue. It doesn't happen on first boot with a fresh db, it only occurs when you restart reth with --full. I'm using op-reth but I can reproduce it with:

❯ ./op-reth node --chain optimism --full
2024-08-29T19:32:16.835291Z  INFO Initialized tracing, debug log directory: /Users/zach/Library/Caches/reth/logs/optimism
2024-08-29T19:32:16.835747Z  INFO Starting reth version="1.0.5 (603e39ab)"
2024-08-29T19:32:17.035902Z  INFO Opening database path="/Users/zach/Library/Application Support/reth/optimism/db"
2024-08-29T19:32:17.051809Z  INFO Saving prune config to toml file
2024-08-29T19:32:17.051974Z  INFO Configuration loaded path="/Users/zach/Library/Application Support/reth/optimism/reth.toml"
2024-08-29T19:32:17.061399Z  INFO Skipping storage verification for OP mainnet, expected inconsistency in OVM chain
2024-08-29T19:32:17.061429Z  INFO Database opened
2024-08-29T19:32:17.155478Z  INFO 
Pre-merge hard forks (block based):
- Frontier                         @0
- Homestead                        @0
- Tangerine                        @0
- SpuriousDragon                   @0
- Byzantium                        @0
- Constantinople                   @0
- Petersburg                       @0
- Istanbul                         @0
- MuirGlacier                      @0
- Berlin                           @3950000
- London                           @105235063
- ArrowGlacier                     @105235063
- GrayGlacier                      @105235063
- Bedrock                          @105235063
Merge hard forks:
- Paris                            @0 (network is known to be merged)
Post-merge hard forks (timestamp based):
- Regolith                         @0
- Shanghai                         @1704992401
- Canyon                           @1704992401
- Cancun                           @1710374401
- Ecotone                          @1710374401
- Fjord                            @1720627201
2024-08-29T19:32:17.155784Z  INFO Transaction pool initialized
2024-08-29T19:32:17.350790Z  INFO StaticFileProducer initialized
2024-08-29T19:32:17.351186Z  INFO Pruner initialized prune_config=PruneConfig { block_interval: 5, segments: PruneModes { sender_recovery: Some(Full), transaction_lookup: None, receipts: Some(Full), account_history: Some(Distance(10064)), storage_history: Some(Distance(10064)), receipts_log_filter: ReceiptsLogPruneConfig({}) } }
2024-08-29T19:32:17.351394Z  INFO Consensus engine initialized
2024-08-29T19:32:17.351514Z  INFO Engine API handler initialized
2024-08-29T19:32:17.351617Z  INFO Creating JWT auth secret file path="/Users/zach/Library/Application Support/reth/optimism/jwt.hex"
2024-08-29T19:32:17.353954Z  INFO RPC auth server started url=127.0.0.1:8551
2024-08-29T19:32:17.354153Z  INFO RPC IPC server started path=/tmp/reth.ipc
2024-08-29T19:32:17.354186Z  INFO Starting consensus engine
2024-08-29T19:32:18.433414Z  INFO Wrote network peers to file peers_file="/Users/zach/Library/Application Support/reth/optimism/known-peers.json"

❯ ./op-reth node --chain optimism --full
2024-08-29T19:32:19.596160Z  INFO Initialized tracing, debug log directory: /Users/zach/Library/Caches/reth/logs/optimism
2024-08-29T19:32:19.596846Z  INFO Starting reth version="1.0.5 (603e39ab)"
2024-08-29T19:32:19.796931Z  INFO Opening database path="/Users/zach/Library/Application Support/reth/optimism/db"
2024-08-29T19:32:19.833153Z ERROR shutting down due to error
Error: Could not load config file "/Users/zach/Library/Application Support/reth/optimism/reth.toml"

Caused by:
   0: Bad TOML data
   1: TOML parse error at line 55, column 12
   1:    |
   1: 55 | receipts = "full"
   1:    |            ^^^^^^
   1: invalid value: string "full", expected prune mode that leaves at least 10064 blocks in the database
mattsse commented 1 month ago

hmm this

receipts = "full"

this shouldn't be in there, could you try after deleting the reth.toml file in case you haven't modified it manually?

zhwrd commented 1 month ago

I didn't touch the reth.toml, this bug occurs on its own without any other commands or modification, easy to reproduce.

If i remove the reth.toml, it reproduces the same way. It must be something to do with how the reth.toml config is serialized for the first time with --full in the arg list. It writes a config that is not valid and fails to read on the second run.

❯ rm -rf /Users/zach/Library/Application\ Support/reth/optimism/reth.toml 

❯ ./op-reth node --chain optimism --full
2024-08-29T20:12:54.516050Z  INFO Initialized tracing, debug log directory: /Users/zach/Library/Caches/reth/logs/optimism
2024-08-29T20:12:54.516596Z  INFO Starting reth version="1.0.5 (603e39ab)"
2024-08-29T20:12:54.716712Z  INFO Opening database path="/Users/zach/Library/Application Support/reth/optimism/db"
2024-08-29T20:12:54.739317Z  INFO Saving prune config to toml file
2024-08-29T20:12:54.739479Z  INFO Configuration loaded path="/Users/zach/Library/Application Support/reth/optimism/reth.toml"
2024-08-29T20:12:54.749645Z  INFO Skipping storage verification for OP mainnet, expected inconsistency in OVM chain
2024-08-29T20:12:54.749680Z  INFO Database opened
2024-08-29T20:12:54.749848Z  INFO 
Pre-merge hard forks (block based):
- Frontier                         @0
- Homestead                        @0
- Tangerine                        @0
- SpuriousDragon                   @0
- Byzantium                        @0
- Constantinople                   @0
- Petersburg                       @0
- Istanbul                         @0
- MuirGlacier                      @0
- Berlin                           @3950000
- London                           @105235063
- ArrowGlacier                     @105235063
- GrayGlacier                      @105235063
- Bedrock                          @105235063
Merge hard forks:
- Paris                            @0 (network is known to be merged)
Post-merge hard forks (timestamp based):
- Regolith                         @0
- Shanghai                         @1704992401
- Canyon                           @1704992401
- Cancun                           @1710374401
- Ecotone                          @1710374401
- Fjord                            @1720627201
2024-08-29T20:12:54.750157Z  INFO Transaction pool initialized
2024-08-29T20:12:54.750260Z  INFO Loading saved peers file=/Users/zach/Library/Application Support/reth/optimism/known-peers.json
2024-08-29T20:12:54.940685Z  INFO StaticFileProducer initialized
2024-08-29T20:12:54.940935Z  INFO Pruner initialized prune_config=PruneConfig { block_interval: 5, segments: PruneModes { sender_recovery: Some(Full), transaction_lookup: None, receipts: Some(Full), account_history: Some(Distance(10064)), storage_history: Some(Distance(10064)), receipts_log_filter: ReceiptsLogPruneConfig({}) } }
2024-08-29T20:12:54.941076Z  INFO Consensus engine initialized
2024-08-29T20:12:54.941182Z  INFO Engine API handler initialized
2024-08-29T20:12:54.942138Z  INFO RPC auth server started url=127.0.0.1:8551
2024-08-29T20:12:54.942258Z  INFO RPC IPC server started path=/tmp/reth.ipc
2024-08-29T20:12:54.942271Z  INFO Starting consensus engine
^C2024-08-29T20:12:55.976424Z  INFO Wrote network peers to file peers_file="/Users/zach/Library/Application Support/reth/optimism/known-peers.json"

❯ ./op-reth node --chain optimism --full
2024-08-29T20:12:56.515160Z  INFO Initialized tracing, debug log directory: /Users/zach/Library/Caches/reth/logs/optimism
2024-08-29T20:12:56.515517Z  INFO Starting reth version="1.0.5 (603e39ab)"
2024-08-29T20:12:56.547250Z  INFO Opening database path="/Users/zach/Library/Application Support/reth/optimism/db"
2024-08-29T20:12:56.586618Z ERROR shutting down due to error
Error: Could not load config file "/Users/zach/Library/Application Support/reth/optimism/reth.toml"

Caused by:
   0: Bad TOML data
   1: TOML parse error at line 55, column 12
   1:    |
   1: 55 | receipts = "full"
   1:    |            ^^^^^^
   1: invalid value: string "full", expected prune mode that leaves at least 10064 blocks in the database

Location:
    crates/node/builder/src/launch/common.rs:119:14
emnul commented 1 month ago

@mattsse I was able to repro on latest using

cargo run --bin op-reth -F optimism -- node --chain optimism --full
... Ctrl + c
cargo run --bin op-reth -F optimism -- node --chain optimism --full

It fails on the second execution with the same error. I can take this issue.

emnul commented 1 month ago

Looks like PruneModes.receipts.Some(Full) is being serialized as receipts = "full" by serde in crates/config/src/config.rs > impl Config > pub fn save. Should it be serialized to receipts = { distance = 10064 } (MINIMUM_PRUNING_DISTANCE) ?

mattsse commented 1 month ago

ah I see, this only affects, optimism because of this bad unwrap:

https://github.com/paradigmxyz/reth/blob/d59854f1dcd66caef32f0eea23440c2532c29611/crates/node/core/src/args/pruning.rs#L30-L35