hyperledger / besu

An enterprise-grade Java-based, Apache 2.0 licensed Ethereum client https://wiki.hyperledger.org/display/besu
https://www.hyperledger.org/projects/besu
Apache License 2.0
1.43k stars 757 forks source link

Full node takes up more space than archive node #7238

Open chenqping opened 2 weeks ago

chenqping commented 2 weeks ago

Description

we deployed two nodes for high availability, one archive node and one full node(snap sync both Forest and Bonsai tried), the space archive node takes up is only 3.9 G, while Forest format takes up to 8.7G, Bonesai takes up to 5.6G, very weird as we think full node should save space.

Acceptance Criteria

Steps to Reproduce (Bug)

  1. configure an archive node and a full node with snap sync Bonsai format
  2. let full node connect to archive node to sync, archive node connects to external validators to sync(we may change both to connect outside directly to sync later)
  3. du the data directory

Expected behavior: [What you expect to happen] The full node data directory should be smaller Actual behavior: [What actually happens] The full node data directory is bigger Frequency: [What percentage of the time does it occur?] always

Logs (if a bug)

Please post relevant logs from Besu (and the consensus client, if running proof of stake) from before and after the issue.

Versions (Add all that apply)

Smart contract information (If you're reporting an issue arising from deploying or calling a smart contract, please supply related information)

Additional Information (Add any of the following or anything else that may be relevant)

non-fungible-nelson commented 1 week ago

Hi there - can you try updating the nodes and seeing if anything changes? We have made some improvements to the database in subsequent versions.

My hunch is that over time, the Archive node will absolutely be larger. We keep more data around in Full nodes to help with block processing performance like caches. Over time, this will not increase linearly, but the Archive node will.

@matkt might also have some insight into this, and also commands you can run to give the size of your database, perhaps.

matkt commented 1 week ago

could you share your configuration (flags etc) for each bonsai test ?

matkt commented 1 week ago

could you also run

./bin/besu --data-path=/data/besu storage rocksdb usage

in order to have more info on your database for each step

chenqping commented 5 days ago

@matkt

could you also run

./bin/besu --data-path=/data/besu storage rocksdb usage

in order to have more info on your database for each step

Hi, upload the snapshots from the two nodes, and configuration full node full-node-storage archive node archive-node-storage

Sync

sync-mode="X_SNAP" data-storage-format="BONSAI"

bonsai-historical-block-limit=256

fast-sync-min-peers=1

chenqping commented 5 days ago

Hi there - can you try updating the nodes and seeing if anything changes? We have made some improvements to the database in subsequent versions.

My hunch is that over time, the Archive node will absolutely be larger. We keep more data around in Full nodes to help with block processing performance like caches. Over time, this will not increase linearly, but the Archive node will.

@matkt might also have some insight into this, and also commands you can run to give the size of your database, perhaps.

hi @non-fungible-nelson, which version, and do you hv any calculation formula or ratio of full node storage vs archive nodes?

matkt commented 5 days ago

@matkt

could you also run

./bin/besu --data-path=/data/besu storage rocksdb usage

in order to have more info on your database for each step

Hi, upload the snapshots from the two nodes, and configuration full node full-node-storage archive node archive-node-storage

Sync sync-mode="X_SNAP" data-storage-format="BONSAI" #bonsai-historical-block-limit=256 fast-sync-min-peers=1

your screenshot seems to be invalid . the full node don't have any state , only the blockchain is saved. and the archive has the column of a forest node and the size seems to really small. is your node syncing ?

matkt commented 5 days ago

it will be nice to share your logs when your bonsai nodes are starting to be sure you have the good configuration

chenqping commented 5 days ago

it will be nice to share your logs when your bonsai nodes are starting to be sure you have the good configuration

Hi matkt thanks for reponding, from the eth_syncing API call, the archive node is false but in fact always importing blocks from an external source, the full node (follows the archive node) shows it's always syncing with start, current, and highest, so in our scenario, the archive node is always ahead of the full node

here uploads the full node log we configured rolling, so here gave the current log file besu.log

matkt commented 4 days ago

thanks but I need more logs. when you restart your node you should have something

####################################################################################################
#                                                                                                  #
# Besu version 24.6.0                                                                              #
#                                                                                                  #
# Configuration:                                                                                   #
# Network: Mainnet                                                                                 #
# Network Id: 1                                                                                    #
# Data storage: Bonsai                                                                             #
# Sync mode: Checkpoint                                                                            #
# RPC HTTP APIs: FLEET,TRACE,ADMIN,DEBUG,NET,ETH,WEB3,TXPOOL                                       #
# RPC HTTP port: 8545                                                                              #
# Engine APIs: ENGINE,ETH                                                                          #
# Engine port: 8551                                                                                #
# Engine JWT: /etc/jwt-secret.hex                                                                  #
# Using LAYERED transaction pool implementation                                                    #
# Using STACKED worldstate update mode                                                             #
# Limit trie logs enabled: retention: 512; prune window: 30000                                     #
#                                                                                                  #
# Host:                                                                                            #
# Java: openjdk-java-21                                                                            #
# Maximum heap size: 3.90 GB                                                                       #
# OS: linux-x86_64                                                                                 #
# glibc: 2.35                                                                                      #
# jemalloc: 5.2.1-0-gea6b3e973b477b8061e0076bb257dbd7f3faa756                                      #
# Total memory: 15.60 GB                                                                           #
# CPU cores: 4                                                                                     #
#                                                                                                  #
# Plugin Registration Summary:                                                                     #
####################################################################################################","throwable":""}

also regarding the log you are sharing your don't sync at all.

are you running a qbft network ? if you want to use snapsync with a qbft network there is a PR in order to enable that https://github.com/hyperledger/besu/pull/7140

for the moment you can use fastsync if you want to sync quickly

chenqping commented 20 hours ago

Hi @matkt sorry for late update, as mentioned, we used two nodes above to follow block producing network with 4 qbft nodes, one is archive node, and the other is snap bonsai sync configuration, we want to compare archive node with full node in private chains, how much storage can save versus archive node, here attach the full node start log for diagnosis, thanks. I also tried fast sync ,it indeed synced fast but storage a little higher than archive node too. Also what's the difference between fast sync and snap sync ,thanks! full node start log.txt