2.7.2 sync blocks extremely slow

openethereum / parity-ethereum

The fast, light, and robust client for Ethereum-like networks.

Other

6.83k stars 1.69k forks source link

2.7.2 sync blocks extremely slow #11494

Open fish-sammy opened 4 years ago

fish-sammy commented 4 years ago

Parity Ethereum version: 2.7.2
Operating system: Linux (debian:buster-slim)
Installation: built from source
Fully synchronized: yes
Network: ethereum
Restarted: yes

I have been running 2 parity nodes for ethereum mainnet in a Kubernetes cluster, and just upgraded them from 2.5.13 to 2.7.2. Now syncing blocks become extremely slow (~0.04 blk/s) and they cannot even keep up with the chain.

My TOML configuration file for both nodes is as following

# This config should be placed in following path:
#   ~/.local/share/io.parity.ethereum/config.toml

[parity]
# Ethereum Mainnet Network
chain = "mainnet"
# Parity continously syncs the chain
mode = "active"
# No updates will be auto-installed
auto_update = "none"
# Disables auto downloading of new releases. Not recommended.
no_download = true
# Blockchain and settings will be stored in /data.
base_path = "/data"

[rpc]
# Allows Cross-Origin Requests from domain '*'.
cors = ["*"]
#  JSON-RPC will be listening for connections on IP all.
interface = "all"
# Only selected APIs will be exposed over this interface.
apis = ["web3", "eth", "net", "traces"]
# Allow connections only using specified addresses.
hosts = ["all"]
# JSON-RPC over HTTP will be accessible on port 9091.
port = 9091
# Threads for handling incoming connections for HTTP JSON-RPC server.
server_threads = 16

[websockets]
#  JSON-RPC will be listening for connections on IP all.
interface = "all"
# Allows connecting from Origin 'all'.
origins = ["all"]
# Only selected APIs will be exposed over this interface.
apis = ["pubsub"]
# JSON-RPC over WebSockets will be accessible on port 9092.
port = 9092

[ipc]
# You won't be able to use IPC to interact with Parity.
disable = true

[footprint]
# Compute and Store tracing data. (Enables trace_* APIs).
tracing = "on"
# Prune old state data. Maintains journal overlay - fast but extra 50MB of memory used.
pruning = "fast"
# Will keep up to 2000 old state entries.
pruning_history = 20000
# Will keep up to 8192 MB old state entries.
pruning_memory = 8192
# If defined will never use more then 32768MB for all caches. (Overrides other cache settings).
cache_size = 32768
# Initial number of verifier threads
num_verifiers = 16
# Number of threads will vary depending on the workload. Not guaranteed to be faster.
scale_verifiers = true

[mining]
# Maximum number of transactions per sender in the queue. By default it's 1% of the entire queue, but not less than 16.
tx_queue_per_sender = 256
# Maximum amount of memory that can be used by the transaction queue. Setting this parameter to 0 disables limiting.
tx_queue_mem_limit = 32
# Parity will keep/relay at most 8196 transactions in queue.
tx_queue_size = 8196

[network]
# Override the bootnodes from selected chain file.
bootnodes = ["enode://d860a01f9722d78051619d1e2351aba3f43f943f6f00718d1b9baa4101932a1f5011f16bb2b1bb35db20d6fe28fa0bf09636d26a87d31de9ec6203eeedb1f666@18.138.108.67:30303", "enode://22a8232c3abc76a16ae9d6c3b164f98775fe226f0917b0ca871128a74a8e9630b458460865bab457221f1d448dd9791d24c4e5d88786180ac185df813a68d4de@3.209.45.79:30303", "enode://ca6de62fce278f96aea6ec5a2daadb877e51651247cb96ee310a318def462913b653963c155a0ef6c7d50048bba6e6cea881130857413d9f50a621546b590758@34.255.23.113:30303", "enode://279944d8dcd428dffaa7436f25ca0ca43ae19e7bcf94a8fb7d1641651f92d121e972ac2e8f381414b80cc8e5555811c2ec6e1a99bb009b3f53c4c69923e11bd8@35.158.244.151:30303", "enode://8499da03c47d637b20eee24eec3c356c9a2e6148d6fe25ca195c7949ab8ec2c03e3556126b0d7ed644675e78c4318b08691b7b57de10e5f0d40d05b09238fa0a@52.187.207.27:30303", "enode://103858bdb88756c71f15e9b5e09b56dc1be52f0a5021d46301dbbfb7e130029cc9d0d6f73f693bc29b665770fff7da4d34f3c6379fe12721b5d7a0bcb5ca1fc1@191.234.162.198:30303", "enode://715171f50508aba88aecd1250af392a45a330af91d7b90701c436b618c86aaa1589c9184561907bebbb56439b8f8787bc01f49a7c77276c58c1b09822d75e8e8@52.231.165.108:30303", "enode://5d6d7cd20d6da4bb83a1d28cadb5d409b64edf314c0335df658c1a54e32c7c4a7ab7823d57c39b6a757556e68ff1df17c748b698544a55cb488b52479a92b60f@104.42.217.25:30303", "enode://979b7fa28feeb35a4741660a16076f1943202cb72b6af70d327f053e248bab9ba81760f39d0701ef1d8f89cc1fbd2cacba0710a12cd5314d5e0c9021aa3637f9@5.1.83.226:30303"]

Each node is running in a Kubernetes pod with SSD, 16 CPU, and 64GB of memory. What I notice is despite that scale_verifiers is enabled, both nodes only consume 1 CPU all the time as shown below.

eth-parity-0       1025m        34660Mi
eth-parity-1       1028m        34502Mi

The logs are much like

2020-02-15 05:34:51 UTC Syncing #9485347 0x7061…06ce     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+  393 Qed  #9485728   23/25 peers     58 MiB chain   24 GiB db   34 MiB queue  249 KiB sync  RPC:  0 conn,    0 req/s,  430 µs
2020-02-15 05:35:08 UTC Syncing #9485347 0x7061…06ce     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+  396 Qed  #9485730   23/25 peers     58 MiB chain   24 GiB db   34 MiB queue  249 KiB sync  RPC:  0 conn,    0 req/s,  430 µs
2020-02-15 05:35:08 UTC Syncing #9485347 0x7061…06ce     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+  393 Qed  #9485728   23/25 peers     58 MiB chain   24 GiB db   34 MiB queue  249 KiB sync  RPC:  0 conn,    0 req/s,  430 µs
2020-02-15 05:35:17 UTC Syncing #9485347 0x7061…06ce     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+  397 Qed  #9485731   23/25 peers     58 MiB chain   24 GiB db   34 MiB queue  249 KiB sync  RPC:  0 conn,    0 req/s,  430 µs
2020-02-15 05:35:17 UTC 10 -> GetNodeData:   item 0/1 – slow state fetch for hash 0xe78a53e5f6cc66d9eb8582fc2b221dc3eff288f6c30ba22359029b0687d62fa8; took Instant { tv_sec: 6415631, tv_nsec: 100498033 }
2020-02-15 05:35:25 UTC Syncing #9485347 0x7061…06ce     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+  385 Qed  #9485731   23/25 peers     58 MiB chain   24 GiB db   33 MiB queue  249 KiB sync  RPC:  0 conn,    0 req/s,  430 µs
2020-02-15 05:35:25 UTC Syncing #9485347 0x7061…06ce     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+  397 Qed  #9485731   23/25 peers     58 MiB chain   24 GiB db   34 MiB queue  249 KiB sync  RPC:  0 conn,    0 req/s,  430 µs
2020-02-15 05:35:25 UTC Syncing #9485348 0x5ecf…7599     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+  385 Qed  #9485731   23/25 peers     58 MiB chain   24 GiB db   33 MiB queue  249 KiB sync  RPC:  0 conn,    0 req/s,  430 µs
2020-02-15 05:35:25 UTC 10 -> GetNodeData:   item 0/1 – slow state fetch for hash 0xd01481ee138007c9f0d27f1c562c6f6f8f197bf20ec889f4bffbba12fab72223; took Instant { tv_sec: 6415638, tv_nsec: 480517832 }

Can someone please point out what could have gone wrong with my settings?

fish-sammy commented 4 years ago

Turned on debug logging for sync module and I can't find any problem in there other than it is deactivating peers all the time?

2020-02-15 07:31:09 UTC IO Worker #0 DEBUG sync  Peer 79 timeout while we were asking them for ForkHeader; disconnecting.
2020-02-15 07:31:09 UTC IO Worker #0 DEBUG sync  Disconnected 79
2020-02-15 07:31:09 UTC IO Worker #2 DEBUG sync  Deactivating peer 59
2020-02-15 07:31:09 UTC IO Worker #2 DEBUG sync  Connected 10:Geth/v1.9.7-stable-a718daa6/linux-amd64/go1.13.4
2020-02-15 07:31:10 UTC IO Worker #1 DEBUG sync  Deactivating peer 43
2020-02-15 07:31:17 UTC IO Worker #0 INFO import  Syncing #9485532 0xfce6…4249     0.06 blk/s    2.0 tx/s    0.1 Mgas/s      0+  609 Qed  #9486145   23/25 peers     30 MiB chain   24 GiB db   51 MiB queue  902 KiB sync  RPC:  0 conn,    0 req/s,  454 µs
2020-02-15 07:31:25 UTC IO Worker #1 INFO import  Syncing #9485532 0xfce6…4249     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+  609 Qed  #9486145   24/25 peers     31 MiB chain   24 GiB db   51 MiB queue  902 KiB sync  RPC:  0 conn,    0 req/s,  454 µs
2020-02-15 07:31:33 UTC IO Worker #3 INFO import  Syncing #9485533 0x5925…2ea8     0.06 blk/s    2.1 tx/s    0.2 Mgas/s      0+  609 Qed  #9486145   24/25 peers     31 MiB chain   24 GiB db   51 MiB queue  902 KiB sync  RPC:  0 conn,    0 req/s,  454 µs
2020-02-15 07:31:41 UTC IO Worker #0 INFO import  Syncing #9485534 0x68b7…e3a3     0.06 blk/s    2.1 tx/s    0.2 Mgas/s      0+  609 Qed  #9486145   24/25 peers     35 MiB chain   24 GiB db   51 MiB queue  902 KiB sync  RPC:  0 conn,    0 req/s,  454 µs
2020-02-15 07:31:50 UTC IO Worker #1 INFO import  Syncing #9485534 0x68b7…e3a3     0.06 blk/s    0.0 tx/s    0.0 Mgas/s      0+  609 Qed  #9486145   24/25 peers     35 MiB chain   24 GiB db   51 MiB queue  902 KiB sync  RPC:  0 conn,    0 req/s,  454 µs
2020-02-15 07:31:58 UTC IO Worker #3 INFO import  Syncing #9485534 0x68b7…e3a3     0.12 blk/s    0.0 tx/s    0.0 Mgas/s      0+  609 Qed  #9486145   24/25 peers     35 MiB chain   24 GiB db   51 MiB queue  902 KiB sync  RPC:  0 conn,    0 req/s,  454 µs
2020-02-15 07:31:58 UTC IO Worker #0 INFO import  Syncing #9485535 0xd326…aa8e     0.06 blk/s    0.0 tx/s    0.0 Mgas/s      0+  609 Qed  #9486145   24/25 peers     35 MiB chain   24 GiB db   51 MiB queue  902 KiB sync  RPC:  0 conn,    0 req/s,  454 µs
2020-02-15 07:32:14 UTC IO Worker #0 INFO import  Syncing #9485535 0xd326…aa8e     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+  609 Qed  #9486145   24/25 peers     40 MiB chain   24 GiB db   51 MiB queue  902 KiB sync  RPC:  0 conn,    0 req/s,  454 µs
2020-02-15 07:32:14 UTC IO Worker #3 INFO import  Syncing #9485535 0xd326…aa8e     0.06 blk/s    9.3 tx/s    0.6 Mgas/s      0+  609 Qed  #9486145   24/25 peers     41 MiB chain   24 GiB db   51 MiB queue  902 KiB sync  RPC:  0 conn,    0 req/s,  454 µs
2020-02-15 07:32:31 UTC IO Worker #3 INFO import  Syncing #9485536 0xbe02…1efd     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+  609 Qed  #9486145   24/25 peers     42 MiB chain   24 GiB db   51 MiB queue  902 KiB sync  RPC:  0 conn,    0 req/s,  454 µs
2020-02-15 07:32:31 UTC IO Worker #0 INFO import  Syncing #9485536 0xbe02…1efd     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+  609 Qed  #9486145   24/25 peers     43 MiB chain   24 GiB db   51 MiB queue  902 KiB sync  RPC:  0 conn,    0 req/s,  454 µs
2020-02-15 07:32:31 UTC IO Worker #3 DEBUG sync  Deactivating peer 69
2020-02-15 07:32:32 UTC IO Worker #0 DEBUG sync  Deactivating peer 62
2020-02-15 07:32:32 UTC IO Worker #2 DEBUG sync  Deactivating peer 35
2020-02-15 07:32:33 UTC IO Worker #3 DEBUG sync  Deactivating peer 60
2020-02-15 07:32:55 UTC IO Worker #3 INFO import  Syncing #9485537 0xbdd8…3dee     0.06 blk/s    5.8 tx/s    0.3 Mgas/s      0+  597 Qed  #9486145   24/25 peers     43 MiB chain   24 GiB db   50 MiB queue  902 KiB sync  RPC:  0 conn,    0 req/s,  454 µs
2020-02-15 07:32:55 UTC IO Worker #0 INFO import  Syncing #9485536 0xbe02…1efd     0.06 blk/s    5.8 tx/s    0.3 Mgas/s      0+  597 Qed  #9486145   24/25 peers     43 MiB chain   24 GiB db   50 MiB queue  902 KiB sync  RPC:  0 conn,    0 req/s,  454 µs
2020-02-15 07:32:55 UTC IO Worker #2 INFO import  Syncing #9485537 0xbdd8…3dee     0.06 blk/s    5.7 tx/s    0.3 Mgas/s      0+  597 Qed  #9486145   24/25 peers     43 MiB chain   24 GiB db   50 MiB queue  902 KiB sync  RPC:  0 conn,    0 req/s,  454 µs
2020-02-15 07:33:04 UTC IO Worker #3 DEBUG sync  Disconnected 10
2020-02-15 07:33:04 UTC IO Worker #0 DEBUG sync  Deactivating peer 9
2020-02-15 07:33:04 UTC IO Worker #3 DEBUG sync  Deactivating peer 28
2020-02-15 07:33:04 UTC IO Worker #1 DEBUG sync  Deactivating peer 32
2020-02-15 07:33:05 UTC IO Worker #2 DEBUG sync  Deactivating peer 59
2020-02-15 07:33:05 UTC IO Worker #2 DEBUG sync  Deactivating peer 90
2020-02-15 07:33:06 UTC IO Worker #3 DEBUG sync  Deactivating peer 68
2020-02-15 07:33:06 UTC IO Worker #0 DEBUG sync  Deactivating peer 59
2020-02-15 07:33:06 UTC IO Worker #3 DEBUG sync  Deactivating peer 72
2020-02-15 07:33:08 UTC IO Worker #3 DEBUG sync  Deactivating peer 10
2020-02-15 07:33:10 UTC IO Worker #3 DEBUG sync  Deactivating peer 35
2020-02-15 07:33:10 UTC IO Worker #3 DEBUG sync  Deactivating peer 7
2020-02-15 07:33:11 UTC IO Worker #0 DEBUG sync  Deactivating peer 22
2020-02-15 07:33:11 UTC IO Worker #0 DEBUG sync  Deactivating peer 45
2020-02-15 07:33:11 UTC IO Worker #0 DEBUG sync  Deactivating peer 28
2020-02-15 07:33:12 UTC IO Worker #2 DEBUG sync  Deactivating peer 78
2020-02-15 07:33:12 UTC IO Worker #3 DEBUG sync  Deactivating peer 72
2020-02-15 07:33:12 UTC IO Worker #2 DEBUG sync  Deactivating peer 67
2020-02-15 07:33:13 UTC IO Worker #3 DEBUG sync  Deactivating peer 65
2020-02-15 07:33:13 UTC IO Worker #1 DEBUG sync  Connected 90:Geth/v1.8.21-stable-ca0ce3e9/linux-amd64/go1.11.2
2020-02-15 07:33:13 UTC IO Worker #0 DEBUG sync  Disconnected 90
2020-02-15 07:33:13 UTC IO Worker #2 DEBUG sync  Connected 69:Geth/v1.8.21-stable-ca0ce3e9/linux-amd64/go1.11.2
2020-02-15 07:33:13 UTC IO Worker #0 DEBUG sync  Disconnected 69
2020-02-15 07:33:14 UTC IO Worker #2 DEBUG sync  Deactivating peer 59
2020-02-15 07:33:15 UTC IO Worker #2 DEBUG sync  Deactivating peer 69
2020-02-15 07:33:15 UTC IO Worker #2 DEBUG sync  Connected 67:Geth/v1.9.8-unstable-8c1e8de8-20191125/linux-amd64/go1.13
2020-02-15 07:33:19 UTC IO Worker #0 INFO import  Syncing #9485537 0xbdd8…3dee     0.04 blk/s    1.0 tx/s    0.0 Mgas/s      0+  597 Qed  #9486145   24/25 peers     44 MiB chain   24 GiB db   50 MiB queue  902 KiB sync  RPC:  0 conn,    0 req/s,  248 µs
2020-02-15 07:33:19 UTC IO Worker #3 INFO import  Syncing #9485538 0xcbe1…715e     0.04 blk/s    1.0 tx/s    0.0 Mgas/s      0+  597 Qed  #9486145   23/25 peers     46 MiB chain   24 GiB db   50 MiB queue  902 KiB sync  RPC:  0 conn,    0 req/s,  248 µs
2020-02-15 07:33:19 UTC IO Worker #2 INFO import  Syncing #9485538 0xcbe1…715e     0.04 blk/s    1.0 tx/s    0.0 Mgas/s      0+  597 Qed  #9486145   24/25 peers     46 MiB chain   24 GiB db   50 MiB queue  902 KiB sync  RPC:  0 conn,    0 req/s,  248 µs
2020-02-15 07:33:28 UTC IO Worker #0 WARN sync  67 -> GetNodeData:   item 0/2 – slow state fetch for hash 0xb7e0794e3682be3c65cee8b3fcafd510fdebab7572b7e256ca22cf8ccfa7725d; took Instant { tv_sec: 6422723, tv_nsec: 831792813 }
2020-02-15 07:33:35 UTC IO Worker #2 DEBUG sync  Disconnected 67
2020-02-15 07:33:35 UTC IO Worker #0 DEBUG sync  Deactivating peer 6
2020-02-15 07:33:35 UTC IO Worker #2 DEBUG sync  Deactivating peer 9
2020-02-15 07:33:35 UTC IO Worker #0 DEBUG sync  Deactivating peer 29
2020-02-15 07:33:36 UTC IO Worker #0 DEBUG sync  Deactivating peer 68
2020-02-15 07:33:36 UTC IO Worker #0 DEBUG sync  Deactivating peer 26
2020-02-15 07:33:36 UTC IO Worker #3 WARN sync  67 -> GetNodeData:   item 0/2 – slow state fetch for hash 0x77a74e359f735aa00acd12839ec0eacba67d51eb7d932294bdf41f998f33ae23; took Instant { tv_sec: 6422728, tv_nsec: 186277804 }
2020-02-15 07:33:36 UTC IO Worker #3 DEBUG sync  "Error sending nodes: Expired"
2020-02-15 07:33:36 UTC IO Worker #3 DEBUG sync  Deactivating peer 61
2020-02-15 07:33:36 UTC IO Worker #3 DEBUG sync  Deactivating peer 62
2020-02-15 07:33:38 UTC IO Worker #1 DEBUG sync  Deactivating peer 69
2020-02-15 07:33:39 UTC IO Worker #2 DEBUG sync  Deactivating peer 71
2020-02-15 07:33:41 UTC IO Worker #3 DEBUG sync  Connected 62:Geth/v1.9.9-stable-01744997/windows-amd64/go1.13.4
2020-02-15 07:33:42 UTC IO Worker #3 DEBUG sync  Deactivating peer 12
2020-02-15 07:33:44 UTC IO Worker #3 INFO import  Syncing #9485538 0xcbe1…715e     0.04 blk/s    0.7 tx/s    0.1 Mgas/s      0+  597 Qed  #9486145   24/25 peers     46 MiB chain   24 GiB db   50 MiB queue  902 KiB sync  RPC:  0 conn,    0 req/s,  248 µs
2020-02-15 07:33:44 UTC IO Worker #2 INFO import  Syncing #9485539 0xdf08…16f0     0.04 blk/s    0.7 tx/s    0.1 Mgas/s      0+  597 Qed  #9486145   23/25 peers     48 MiB chain   24 GiB db   50 MiB queue  902 KiB sync  RPC:  0 conn,    0 req/s,  248 µs
2020-02-15 07:33:44 UTC IO Worker #0 INFO import  Syncing #9485539 0xdf08…16f0     0.04 blk/s    0.7 tx/s    0.1 Mgas/s      0+  597 Qed  #9486145   24/25 peers     48 MiB chain   24 GiB db   50 MiB queue  902 KiB sync  RPC:  0 conn,    0 req/s,  248 µs
2020-02-15 07:33:46 UTC IO Worker #2 DEBUG sync  Disconnected 73
2020-02-15 07:33:48 UTC IO Worker #0 DEBUG sync  Deactivating peer 69
2020-02-15 07:33:49 UTC IO Worker #3 DEBUG sync  Deactivating peer 76
2020-02-15 07:33:49 UTC IO Worker #0 DEBUG sync  Deactivating peer 17
2020-02-15 07:33:50 UTC IO Worker #2 DEBUG sync  Deactivating peer 73
2020-02-15 07:33:50 UTC IO Worker #2 DEBUG sync  Deactivating peer 22
2020-02-15 07:33:50 UTC IO Worker #3 DEBUG sync  Deactivating peer 76
2020-02-15 07:33:51 UTC IO Worker #1 DEBUG sync  Deactivating peer 18
2020-02-15 07:33:51 UTC IO Worker #3 DEBUG sync  Deactivating peer 83
2020-02-15 07:33:52 UTC IO Worker #0 DEBUG sync  Deactivating peer 35
2020-02-15 07:33:53 UTC IO Worker #1 DEBUG sync  Deactivating peer 75
2020-02-15 07:33:53 UTC IO Worker #1 DEBUG sync  Deactivating peer 73
2020-02-15 07:33:53 UTC IO Worker #2 DEBUG sync  Connected 79:Geth/v1.9.10-stable-58cf5686/windows-amd64/go1.13.6
2020-02-15 07:34:08 UTC IO Worker #2 INFO import  Syncing #9485539 0xdf08…16f0     0.04 blk/s    2.4 tx/s    0.2 Mgas/s      0+  597 Qed  #9486145   23/25 peers     48 MiB chain   24 GiB db   50 MiB queue  901 KiB sync  RPC:  0 conn,    0 req/s,  248 µs
2020-02-15 07:34:08 UTC IO Worker #3 INFO import  Syncing #9485540 0xdd5b…a8ff     0.04 blk/s    2.4 tx/s    0.2 Mgas/s      0+  597 Qed  #9486145   24/25 peers     48 MiB chain   24 GiB db   50 MiB queue  902 KiB sync  RPC:  0 conn,    0 req/s,  248 µs
2020-02-15 07:34:08 UTC IO Worker #0 INFO import  Syncing #9485540 0xdd5b…a8ff     0.04 blk/s    2.4 tx/s    0.2 Mgas/s      0+  597 Qed  #9486145   24/25 peers     48 MiB chain   24 GiB db   50 MiB queue  902 KiB sync  RPC:  0 conn,    0 req/s,  248 µs

fish-sammy commented 4 years ago

I started another node to re-sync the entire chain with the exact same pod specs/docker image/configuration file, and weirdly it is able to utilize more CPUs and sync initially.

eth-parity-2       5246m        2882Mi

and

2020-02-15 08:00:14 UTC IO Worker #0 INFO import  Syncing  #374270 0x1e46…4939  1343.40 blk/s 1595.6 tx/s   43.9 Mgas/s   9527+11007 Qed   #394791   23/25 peers    215 MiB chain    2 GiB db   50 MiB queue   50 MiB sync  RPC:  0 conn,    0 req/s,  112 µs

However, it ran into same issue after a while and now it looks like the same as my original nodes

eth-parity-2       1059m        24888Mi

and

2020-02-15 14:41:09 UTC Syncing #1140796 0xd203…be2c     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+24181 Qed  #1164977   22/25 peers    582 MiB chain   12 GiB db  121 MiB queue   69 MiB sync  RPC:  0 conn,    0 req/s,  107 µs
2020-02-15 14:41:29 UTC Syncing #1140796 0xd203…be2c     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+24181 Qed  #1164977   22/25 peers    582 MiB chain   12 GiB db  121 MiB queue   69 MiB sync  RPC:  0 conn,    0 req/s,  107 µs
2020-02-15 14:41:29 UTC Syncing #1140796 0xd203…be2c     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+24181 Qed  #1164977   22/25 peers    582 MiB chain   12 GiB db  121 MiB queue   69 MiB sync  RPC:  0 conn,    0 req/s,  107 µs
2020-02-15 14:41:29 UTC Syncing #1140796 0xd203…be2c     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+24181 Qed  #1164977   22/25 peers    582 MiB chain   12 GiB db  121 MiB queue   69 MiB sync  RPC:  0 conn,    0 req/s,  107 µs
2020-02-15 14:41:29 UTC Syncing #1140796 0xd203…be2c     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+24181 Qed  #1164977   22/25 peers    582 MiB chain   12 GiB db  121 MiB queue   69 MiB sync  RPC:  0 conn,    0 req/s,  107 µ

Not sure if it is relevant at all, but I use a custom Dockerfile which just downloads the executable as following

FROM debian:buster-slim as runner

ENV VERSION 2.7.2
ENV SHA256SUM fe992f0c9b229a4406e82b9ff6d388f4acb4d6ce2782cb79b7bc379e7965ae34
# show backtraces
ENV RUST_BACKTRACE 1

RUN set -ex \
  && apt-get update \
  && apt-get install -qq --no-install-recommends ca-certificates wget curl \
  && rm -rf /var/lib/apt/lists/*

RUN set -ex \
  && wget -q "https://releases.parity.io/ethereum/v${VERSION}/x86_64-unknown-linux-gnu/parity" \
  && echo "${SHA256SUM} parity" | sha256sum -c \
  && chmod u+x parity \
  && mv parity /usr/local/bin/ \
  && apt-get remove -qq wget

COPY ./config.testnet.toml /root/config.testnet.toml
COPY ./config.toml /root/config.toml
COPY ./docker-entrypoint.sh /root/entrypoint.sh
COPY ./check-readiness.sh /check-readiness.sh
RUN chmod +x /check-readiness.sh
COPY ./check-liveness.sh /check-liveness.sh
RUN chmod +x /check-liveness.sh

EXPOSE 9091 9092
ENTRYPOINT ["/root/entrypoint.sh"]

dvdplm commented 4 years ago

The GetNodeData entries in the logs are likely Geth nodes fast syncing off your node (it could also be light clients but that seems much less likely); the slow database fetches are really very very slow. Never seen anything that bad tbh. I would definitely try with a shorter pruning history, 20 000 blocks is very high. Are you sure you need that much state? Even if you do need the state for the high number of old blocks, as an experiment, I'd dial down that to, say, 100, and also remove the increased verifier threads and the extra cache settings, we've seen reports on memory leaks in some configurations. In other words, try with a config as close to the defaults as possible and let us know if the node behaves better. If that helps, then we can turn your settings back on one at a time. Thanks!

fish-sammy commented 4 years ago

@dvdplm Thanks for replying! Unfortunately, due to the nature of the project I am working on, we need to keep 3 days worth of state data to be able to replay transactions. We have been running previous versions of Parity with the same pruning history for almost 2 years and never had any problem with it. With that said, I will try reducing the pruning history and get back to you here. I am just wondering if I reduce the pruning history right now and increase it later, will the node be able to get back to those state data retroactively? Or will it just then continue storing more since the point when I turn it back?

fish-sammy commented 4 years ago

@dvdplm I reduced my configuration to the following as you suggested

# This config should be placed in following path:
#   ~/.local/share/io.parity.ethereum/config.toml

[parity]
# Ethereum Mainnet Network
chain = "mainnet"
# Parity continously syncs the chain
mode = "active"
# No updates will be auto-installed
auto_update = "none"
# Disables auto downloading of new releases. Not recommended.
no_download = true
# Blockchain and settings will be stored in /data.
base_path = "/data"

[rpc]
# Allows Cross-Origin Requests from domain '*'.
cors = ["*"]
#  JSON-RPC will be listening for connections on IP all.
interface = "all"
# Only selected APIs will be exposed over this interface.
apis = ["web3", "eth", "net", "traces"]
# Allow connections only using specified addresses.
hosts = ["all"]
# JSON-RPC over HTTP will be accessible on port 9091.
port = 9091

[websockets]
#  JSON-RPC will be listening for connections on IP all.
interface = "all"
# Allows connecting from Origin 'all'.
origins = ["all"]
# Only selected APIs will be exposed over this interface.
apis = ["pubsub"]
# JSON-RPC over WebSockets will be accessible on port 9092.
port = 9092

[ipc]
# You won't be able to use IPC to interact with Parity.
disable = true

[footprint]
# Compute and Store tracing data. (Enables trace_* APIs).
tracing = "on"
# Prune old state data. Maintains journal overlay - fast but extra 50MB of memory used.
pruning = "fast"
# Will keep up to 2000 old state entries.
pruning_history = 100

The node now is not syncing at all

2020-02-16 04:17:19 UTC Syncing #9486815 0x4078…67d2     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+ 4976 Qed  #9491799   22/25 peers      8 MiB chain   24 GiB db  456 MiB queue    7 MiB sync  RPC:  0 conn,    0 req/s,    0 µs
2020-02-16 04:17:37 UTC Syncing #9486815 0x4078…67d2     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+ 4976 Qed  #9491799   22/25 peers      5 MiB chain   24 GiB db  456 MiB queue    7 MiB sync  RPC:  0 conn,    0 req/s,    0 µs
2020-02-16 04:17:46 UTC Syncing #9486815 0x4078…67d2     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+ 4976 Qed  #9491799   22/25 peers      7 MiB chain   24 GiB db  456 MiB queue    7 MiB sync  RPC:  0 conn,    0 req/s,    0 µs
2020-02-16 04:17:46 UTC Syncing #9486815 0x4078…67d2     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+ 4976 Qed  #9491799   22/25 peers      8 MiB chain   24 GiB db  456 MiB queue    7 MiB sync  RPC:  0 conn,    0 req/s,    0 µs
2020-02-16 04:17:54 UTC Syncing #9486815 0x4078…67d2     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+ 4976 Qed  #9491799   22/25 peers      7 MiB chain   24 GiB db  456 MiB queue    7 MiB sync  RPC:  0 conn,    0 req/s,    0 µs
2020-02-16 04:18:03 UTC 82 -> GetNodeData:   item 0/2 – slow state fetch for hash 0x6886e89b660f08cf1192ea5b1d524bb678ad82c96dcff9149aa5fd0c9dfb7b1d; took Instant { tv_sec: 27684, tv_nsec: 539620713 }
2020-02-16 04:18:03 UTC 82 -> GetNodeData:   item 0/178 – slow state fetch for hash 0x424d087e449e866929ffaff1ba29ca6938e652626309900478977dad4d6b5622; took Instant { tv_sec: 27678, tv_nsec: 576652674 }
2020-02-16 04:18:12 UTC Syncing #9486815 0x4078…67d2     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+ 4976 Qed  #9491799   21/25 peers      5 MiB chain   24 GiB db  456 MiB queue    7 MiB sync  RPC:  0 conn,    0 req/s,    0 µs
2020-02-16 04:18:12 UTC 82 -> GetNodeData:   item 0/2 – slow state fetch for hash 0x6402f32704242f3a5e9fa5fc997ba4eeb42cdb37d22712bfce1770fa60751c40; took Instant { tv_sec: 27688, tv_nsec: 384819723 }
2020-02-16 04:18:21 UTC Syncing #9486815 0x4078…67d2     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+ 4976 Qed  #9491799   22/25 peers      6 MiB chain   24 GiB db  456 MiB queue    7 MiB sync  RPC:  0 conn,    0 req/s,    0 µs
2020-02-16 04:18:21 UTC Syncing #9486815 0x4078…67d2     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+ 4976 Qed  #9491799   21/25 peers      9 MiB chain   24 GiB db  456 MiB queue    7 MiB sync  RPC:  0 conn,    0 req/s,    0 µs
2020-02-16 04:18:30 UTC Syncing #9486815 0x4078…67d2     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+ 4976 Qed  #9491799   21/25 peers      6 MiB chain   24 GiB db  456 MiB queue    7 MiB sync  RPC:  0 conn,    0 req/s,    0 µs
2020-02-16 04:18:48 UTC Syncing #9486815 0x4078…67d2     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+ 4976 Qed  #9491799   21/25 peers      6 MiB chain   24 GiB db  456 MiB queue    7 MiB sync  RPC:  0 conn,    0 req/s,    0 µs
2020-02-16 04:18:57 UTC Syncing #9486815 0x4078…67d2     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+ 4976 Qed  #9491799   21/25 peers      7 MiB chain   24 GiB db  456 MiB queue    7 MiB sync  RPC:  0 conn,    0 req/s,    0 µs
2020-02-16 04:18:57 UTC Syncing #9486815 0x4078…67d2     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+ 4976 Qed  #9491799   21/25 peers      7 MiB chain   24 GiB db  456 MiB queue    7 MiB sync  RPC:  0 conn,    0 req/s,    0 µs
2020-02-16 04:19:06 UTC Syncing #9486815 0x4078…67d2     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+ 4976 Qed  #9491799   21/25 peers      8 MiB chain   24 GiB db  456 MiB queue    7 MiB sync  RPC:  0 conn,    0 req/s,    0 µs
2020-02-16 04:19:24 UTC Syncing #9486815 0x4078…67d2     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+ 4976 Qed  #9491799   21/25 peers      4 MiB chain   24 GiB db  456 MiB queue    7 MiB sync  RPC:  0 conn,    0 req/s,    0 µs
2020-02-16 04:19:32 UTC Syncing #9486815 0x4078…67d2     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+ 4976 Qed  #9491799   21/25 peers      5 MiB chain   24 GiB db  456 MiB queue    7 MiB sync  RPC:  0 conn,    0 req/s,    0 µs
2020-02-16 04:19:32 UTC Syncing #9486815 0x4078…67d2     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+ 4976 Qed  #9491799   21/25 peers      5 MiB chain   24 GiB db  456 MiB queue    7 MiB sync  RPC:  0 conn,    0 req/s,    0 µs
2020-02-16 04:19:41 UTC Syncing #9486815 0x4078…67d2     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+ 4976 Qed  #9491799   21/25 peers      7 MiB chain   24 GiB db  456 MiB queue    7 MiB sync  RPC:  0 conn,    0 req/s,    0 µs
2020-02-16 04:19:58 UTC Syncing #9486815 0x4078…67d2     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+ 4976 Qed  #9491799   21/25 peers      7 MiB chain   24 GiB db  456 MiB queue    7 MiB sync  RPC:  0 conn,    0 req/s,    0 µs
2020-02-16 04:20:07 UTC Syncing #9486815 0x4078…67d2     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+ 4976 Qed  #9491799   21/25 peers      5 MiB chain   24 GiB db  456 MiB queue    7 MiB sync  RPC:  0 conn,    0 req/s,    0 µs
2020-02-16 04:20:07 UTC Syncing #9486815 0x4078…67d2     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+ 4976 Qed  #9491799   21/25 peers      6 MiB chain   24 GiB db  456 MiB queue    7 MiB sync  RPC:  0 conn,    0 req/s,    0 µs
2020-02-16 04:20:16 UTC Syncing #9486815 0x4078…67d2     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+ 4976 Qed  #9491799   21/25 peers      7 MiB chain   24 GiB db  456 MiB queue    7 MiB sync  RPC:  0 conn,    0 req/s,    0 µs
2020-02-16 04:20:34 UTC Syncing #9486815 0x4078…67d2     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+ 4976 Qed  #9491799   21/25 peers      8 MiB chain   24 GiB db  456 MiB queue    7 MiB sync  RPC:  0 conn,    0 req/s,    0 µs

Is it possible that my database is corrupted as a result of upgrading from 2.5.13 to 2.7.2 in a way which does not crash Parity but only significantly slows it down? If that is the case, this might be reproducible as both of my nodes are having the same issue.

EthanHee commented 4 years ago

I have the same problem when i use the docker image from "btccom/parity:v2.5.13-btcpool".here is my config.toml and logs:

# Parity Config Generator
# https://paritytech.github.io/parity-config-generator/
#
# This config should be placed in following path:
#   ~/.local/share/io.parity.ethereum/config.toml

[parity]
# Ethereum Main Network (ETH)
chain = "foundation"
# Parity continously syncs the chain
mode = "active"
# Disables auto downloading of new releases. Not recommended.
no_download = true

[ipc]
# You won't be able to use IPC to interact with Parity.
disable = true

#[dapps]
# You won't be able to access any web Dapps.
#disable = true

[rpc]
#  JSON-RPC will be listening for connections on IP 0.0.0.0.
interface = "0.0.0.0"
# Allows Cross-Origin Requests from domain '*'.
cors = ["*"]

[mining]
# Account address to receive reward when block is mined.
author = "0xaa185ca60063d8418e0f623c5d39713bcc13041a"
# Blocks that you mine will have this text in extra data field.
# Set a different node-id for each node to ensure results of getwork are unique.
extra_data = "Easy to Mine"
# Prepare a block to seal even when there are no miners connected.
force_sealing = true
# Force the node to author new blocks when a new uncle block is imported.
reseal_on_uncle = true
# New pending block will be created for all transactions (both local and external).
reseal_on_txs = "all"
# New pending block will be created only once per 4000 milliseconds.
reseal_min_period = 4000
# Parity will keep/relay at most 8192 transactions in queue.
tx_queue_size = 8192
tx_queue_per_sender = 128

[network]
# Parity will sync by downloading latest state first. Node will be operational in couple minutes.
warp = true
# Specify a path to a file with peers' enodes to be always connected to.
reserved_peers = "/home/parity/.local/share/io.parity.ethereum/peer.list"
# Parity will try to maintain connection to at least 50 peers.
min_peers = 100
# Parity will maintain at most 200 peers.
max_peers = 200

[misc]
logging = "own_tx=trace,sync=info,chain=info,network=info,miner=trace"
log_file = "/home/parity/.local/share/io.parity.ethereum/parity.log"

[footprint]
# Prune old state data. Maintains journal overlay - fast but extra 50MB of memory used.
pruning = "fast"
# If defined will never use more then 1024MB for all caches. (Overrides other cache settings).
cache_size = 1024

[snapshots]
# Disables automatic periodic snapshots.
disable_periodic = true

2020-02-16 08:31:03.882859050 UTC IO Worker #1 INFO import  Syncing #9465825 0xa517…b340     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+ 2367 Qed  #9468204   98/100 peers   69 MiB chain 167 MiB db 258 MiB queue 1 GiB sync  RPC:  0 conn,    3 req/s,   14 µs
2020-02-16 08:31:08.862218367 UTC IO Worker #1 INFO import  Syncing #9465825 0xa517…b340     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+ 2367 Qed  #9468204   98/100 peers   69 MiB chain 167 MiB db 258 MiB queue 1 GiB sync  RPC:  0 conn,    1 req/s,   14 µs
2020-02-16 08:31:13.867002799 UTC IO Worker #1 INFO import  Syncing #9465826 0x54b9…9d14     0.20 blk/s   35.8 tx/s    2.0 Mgas/s      0+ 2367 Qed  #9468204   98/100 peers   69 MiB chain 167 MiB db 258 MiB queue 1 GiB sync  RPC:  0 conn,    2 req/s,   14 µs
2020-02-16 08:31:18.863398454 UTC IO Worker #1 INFO import  Syncing #9465826 0x54b9…9d14     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+ 2367 Qed  #9468204   98/100 peers   69 MiB chain 167 MiB db 258 MiB queue 1 GiB sync  RPC:  0 conn,    1 req/s,   14 µs
2020-02-16 08:31:23.862675134 UTC IO Worker #1 INFO import  Syncing #9465826 0x54b9…9d14     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+ 2367 Qed  #9468204   98/100 peers   69 MiB chain 167 MiB db 258 MiB queue 1 GiB sync  RPC:  0 conn,    2 req/s,   14 µs
2020-02-16 08:31:28.864881371 UTC IO Worker #2 INFO import  Syncing #9465827 0x3bab…263e     0.20 blk/s   49.4 tx/s    2.0 Mgas/s      0+ 2367 Qed  #9468204   98/100 peers   69 MiB chain 167 MiB db 258 MiB queue 1 GiB sync  RPC:  0 conn,    2 req/s,   13 µs
2020-02-16 08:31:33.864014689 UTC IO Worker #2 INFO import  Syncing #9465827 0x3bab…263e     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+ 2367 Qed  #9468204   98/100 peers   69 MiB chain 167 MiB db 258 MiB queue 1 GiB sync  RPC:  0 conn,    1 req/s,   14 µs
2020-02-16 08:31:38.869048210 UTC IO Worker #1 INFO import  Syncing #9465827 0x3bab…263e     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+ 2367 Qed  #9468204   98/100 peers   69 MiB chain 167 MiB db 258 MiB queue 1 GiB sync  RPC:  0 conn,    1 req/s,   14 µs
2020-02-16 08:31:43.872461997 UTC IO Worker #1 INFO import  Syncing #9465829 0x178e…48e3     0.40 blk/s   50.0 tx/s    2.1 Mgas/s      0+ 2367 Qed  #9468204   98/100 peers   69 MiB chain 166 MiB db 258 MiB queue 1 GiB sync  RPC:  0 conn,    1 req/s,   14 µs
2020-02-16 08:31:48.880003171 UTC IO Worker #1 INFO import  Syncing #9465829 0x178e…48e3     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+ 2367 Qed  #9468204   98/100 peers   69 MiB chain 166 MiB db 258 MiB queue 1 GiB sync  RPC:  0 conn,    1 req/s,   14 µs
2020-02-16 08:31:53.865408752 UTC IO Worker #1 INFO import  Syncing #9465829 0x178e…48e3     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+ 2367 Qed  #9468204   97/100 peers   69 MiB chain 166 MiB db 258 MiB queue 1 GiB sync  RPC:  0 conn,    1 req/s,   14 µs
2020-02-16 08:31:58.866663314 UTC IO Worker #2 INFO import  Syncing #9465830 0x932f…196f     0.20 blk/s   25.6 tx/s    2.0 Mgas/s      0+ 2367 Qed  #9468204   98/100 peers   69 MiB chain 166 MiB db 258 MiB queue 1 GiB sync  RPC:  0 conn,    2 req/s,   14 µs
2020-02-16 08:32:03.269516973 UTC http.worker00 WARN rpc  eth_accounts is deprecated and will be removed in future versions: Account management is being phased out see #9997 for alternatives.
2020-02-16 08:32:03.875034170 UTC IO Worker #2 INFO import  Syncing #9465830 0x932f…196f     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+ 2367 Qed  #9468204   98/100 peers   69 MiB chain 166 MiB db 258 MiB queue 1 GiB sync  RPC:  0 conn,    3 req/s,   14 µs
2020-02-16 08:32:08.870466519 UTC IO Worker #2 INFO import  Syncing #9465830 0x932f…196f     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+ 2367 Qed  #9468204   98/100 peers   69 MiB chain 166 MiB db 258 MiB queue 1 GiB sync  RPC:  0 conn,    2 req/s,   14 µs
2020-02-16 08:32:13.879979218 UTC IO Worker #2 INFO import  Syncing #9465831 0x645e…de83     0.20 blk/s   42.6 tx/s    2.0 Mgas/s      0+ 2367 Qed  #9468204   98/100 peers   69 MiB chain 167 MiB db 258 MiB queue 1 GiB sync  RPC:  0 conn,    1 req/s,   14 µs
2020-02-16 08:32:18.878494405 UTC IO Worker #1 INFO import  Syncing #9465832 0x735e…235f     0.20 blk/s   26.8 tx/s    2.0 Mgas/s      0+ 2367 Qed  #9468204   98/100 peers   69 MiB chain 167 MiB db 258 MiB queue 1 GiB sync  RPC:  0 conn,    2 req/s,   14 µs
2020-02-16 08:32:23.886999404 UTC IO Worker #1 INFO import  Syncing #9465832 0x735e…235f     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+ 2367 Qed  #9468204   98/100 peers   69 MiB chain 167 MiB db 258 MiB queue 1 GiB sync  RPC:  0 conn,    1 req/s,   14 µs
2020-02-16 08:32:28.866493074 UTC IO Worker #1 INFO import  Syncing #9465833 0xff0a…abf0     0.20 blk/s   25.6 tx/s    2.0 Mgas/s      0+ 2367 Qed  #9468204   98/100 peers   69 MiB chain 167 MiB db 258 MiB queue 1 GiB sync  RPC:  0 conn,    1 req/s,   14 µs
2020-02-16 08:32:33.870349018 UTC IO Worker #1 INFO import  Syncing #9465833 0xff0a…abf0     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+ 2367 Qed  #9468204   98/100 peers   69 MiB chain 167 MiB db 258 MiB queue 1 GiB sync  RPC:  0 conn,    1 req/s,   14 µs

the sync speed even cannot catch up mainnet's .

EthanHee commented 4 years ago

what can i do for that?

dvdplm commented 4 years ago

I am just wondering if I reduce the pruning history right now and increase it later, will the node be able to get back to those state data retroactively? Or will it just then continue storing more since the point when I turn it back?

No, once pruned there is no way to get the pruned state back; it will, as you say, continue storing more from the point you increase it again.

We have upgraded all of out bootnodes from 2.5 to 2.7 and have not seen these issues, so whatever this is, I don't think it affects everyone (making it trickier to debug ofc).

@sammy1991106 Do you think you can share your rocksdb LOG file with us? You can find it in <your base path>/chains/ethereum/db/<your account>/overlayrecent/db/LOG. It's a long shot but maybe you're hitting some kind of pathological scenario wrt to write amplification. :/

fish-sammy commented 4 years ago

@dvdplm Yes absolutely! I attached the LOGs from both nodes. Let me know you find anything.

LOG-0.txt LOG-1.txt

fish-sammy commented 4 years ago

@dvdplm In the meanwhile is it doable to warp sync to a specific block X, and then turn on tracing and full sync from there?

EthanHee commented 4 years ago

I run another parity node like the follows:

ExecStart=/usr/bin/parity --author "***" --mode active --pruning fast --db-compaction ssd --cache-size 1024 --config /etc/parity/config.toml --log-file /root/parity.log

and here is my config.toml

# This config should be placed in following path:
#   %AppData%\Parity\Ethereum\config.toml

[rpc]
#  JSON-RPC will be listening for connections on IP 0.0.0.0.
interface = "0.0.0.0"
# Only selected APIs will be exposed over this interface.
apis = ["web3", "eth", "pubsub", "net", "parity", "parity_pubsub", "traces", "shh", "shh_pubsub", "personal", "parity_accounts", "secretstore", "rpc", "private"]

the log sa folllows:

2020-02-17 14:30:22  Syncing #9469862 0x3e34…fdce     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+ 1493 Qed  #9471364   35/50 peers      9 MiB chain   90 MiB db  140 MiB queue   13 MiB sync  RPC:  0 conn,    3 req/s,   14 µs
2020-02-17 14:30:28  Syncing #9469863 0xce69…1a9b     0.20 blk/s   29.6 tx/s    2.0 Mgas/s      0+ 1493 Qed  #9471364   35/50 peers      9 MiB chain   90 MiB db  140 MiB queue   13 MiB sync  RPC:  0 conn,    1 req/s,   14 µs
2020-02-17 14:30:33  Syncing #9469864 0x5b65…6aae     0.20 blk/s   11.4 tx/s    0.9 Mgas/s      0+ 1493 Qed  #9471364   35/50 peers      9 MiB chain   89 MiB db  140 MiB queue   13 MiB sync  RPC:  0 conn,    2 req/s,   14 µs
2020-02-17 14:30:37  Syncing #9469864 0x5b65…6aae     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+ 1493 Qed  #9471364   35/50 peers      9 MiB chain   89 MiB db  140 MiB queue   13 MiB sync  RPC:  0 conn,    2 req/s,   14 µs
2020-02-17 14:30:43  Syncing #9469864 0x5b65…6aae     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+ 1493 Qed  #9471364   35/50 peers      9 MiB chain   89 MiB db  140 MiB queue   13 MiB sync  RPC:  0 conn,    3 req/s,   14 µs
2020-02-17 14:30:47  Syncing #9469865 0x9d00…5ead     0.20 blk/s   29.2 tx/s    2.0 Mgas/s      0+ 1493 Qed  #9471364   35/50 peers      9 MiB chain   90 MiB db  140 MiB queue   13 MiB sync  RPC:  0 conn,    1 req/s,   14 µs
2020-02-17 14:30:53  Syncing #9469866 0x0dfa…0905     0.20 blk/s    9.6 tx/s    2.0 Mgas/s      0+ 1493 Qed  #9471364   35/50 peers      9 MiB chain   90 MiB db  140 MiB queue   13 MiB sync  RPC:  0 conn,    3 req/s,   14 µs
2020-02-17 14:30:55  eth_accounts is deprecated and will be removed in future versions: Account management is being phased out see #9997 for alternatives.
2020-02-17 14:30:58  Syncing #9469867 0x6ef2…f0a9     0.20 blk/s    6.2 tx/s    2.0 Mgas/s      0+ 1493 Qed  #9471364   35/50 peers      9 MiB chain   89 MiB db  140 MiB queue   13 MiB sync  RPC:  0 conn,    3 req/s,   14 µs
2020-02-17 14:31:03  Syncing #9469867 0x6ef2…f0a9     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+ 1493 Qed  #9471364   35/50 peers      9 MiB chain   89 MiB db  140 MiB queue   13 MiB sync  RPC:  0 conn,    1 req/s,   14 µs
2020-02-17 14:31:07  Syncing #9469867 0x6ef2…f0a9     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+ 1493 Qed  #9471364   35/50 peers      9 MiB chain   89 MiB db  140 MiB queue   13 MiB sync  RPC:  0 conn,    2 req/s,   14 µs
2020-02-17 14:31:13  Syncing #9469868 0xf944…3f57     0.20 blk/s   29.4 tx/s    2.0 Mgas/s      0+ 1493 Qed  #9471364   35/50 peers      9 MiB chain   90 MiB db  140 MiB queue   13 MiB sync  RPC:  0 conn,    1 req/s,   14 µs
2020-02-17 14:31:17  Syncing #9469869 0xe821…9f5a     0.20 blk/s    8.2 tx/s    2.0 Mgas/s      0+ 1493 Qed  #9471364   35/50 peers      9 MiB chain   91 MiB db  140 MiB queue   13 MiB sync  RPC:  0 conn,    1 req/s,   14 µs
2020-02-17 14:31:22  Syncing #9469869 0xe821…9f5a     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+ 1493 Qed  #9471364   35/50 peers      9 MiB chain   91 MiB db  140 MiB queue   13 MiB sync  RPC:  0 conn,    1 req/s,   14 µs
2020-02-17 14:31:28  Syncing #9469870 0xd581…2243     0.20 blk/s   20.2 tx/s    1.5 Mgas/s      0+ 1493 Qed  #9471364   35/50 peers      9 MiB chain   90 MiB db  140 MiB queue   13 MiB sync  RPC:  0 conn,    1 req/s,   14 µs
2020-02-17 14:31:32  Syncing #9469871 0x3a49…797e     0.20 blk/s   11.4 tx/s    2.0 Mgas/s      0+ 1481 Qed  #9471364   35/50 peers      9 MiB chain   91 MiB db  139 MiB queue   13 MiB sync  RPC:  0 conn,    1 req/s,   14 µs

the parity version is:

root@iZj6c20llck3brqzgtwgh7Z:~# parity --version
Parity Ethereum Client.
  version Parity-Ethereum/v2.7.2-stable-2662d19-20200206/x86_64-unknown-linux-gnu/rustc1.41.0
Copyright 2015-2020 Parity Technologies (UK) Ltd.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

By Wood/Paronyan/Kotewicz/Drwięga/Volf/Greeff
   Habermeier/Czaban/Gotchac/Redman/Nikolsky
   Schoedon/Tang/Adolfsson/Silva/Palm/Hirsz et al.

the same result that syc very very slowly.

hope response ,thank you .

dvdplm commented 4 years ago

warp sync to a specific block X, and then turn on tracing and full sync from there?

I'm afraid that is not currently possible. I don't know of a deep technical reason for that though so I think it can be done. But it's not a solution to your current problem though.

dvdplm commented 4 years ago

@EthanHee so your node is not a tracing node then? And using 2.5 it syncs significantly quicker?

fish-sammy commented 4 years ago

@dvdplm I have re-synced two 2.5.13 nodes and two 2.7.2 nodes, with warp sync turned on and warp barrier targetting block 9450000. I realized that we only use trace_replayTransaction which is part of the Ad-hoc Tracing API and does not require the trace db. Here is what I observed.

2.5.13 syncs to the tip of chain way faster than 2.7.2. I can't really say how much faster because I ended up killing the 2.7.2 ones before they caught up. By the time 2.5.13 nodes were caught up, the 2.7.2 ones were still > 20000 blocks away from the tip.
After both 2.5.13 and 2.7.2 finished the initial state download and were running full sync, I redirected our system to run against them to see how is the performance of trace_replayTransaction. What we basically do is simply replaying all transactions in blocks and asking for stateDiff only. The 2.5.13 nodes were way faster. I didn't do the math, but visually we were getting all state diffs like ~15 seconds per block with 2.7.2 v.s. less than 10 seconds per block with 2.5.13. (we experimented with both for like 2 hours so it should not really be that just transaction counts or gas cost in blocks are uneven)
When the trace_replayTransaction calls were flooded to the nodes, 2.5.13 was able to use all the CPUs available to them while 2.7.2 only used 3 or 4 out of 16 cores (we set rpc server_threads to match the number of CPUs we have on the machine).

Let me know if you find anything. We would love to provide logs, debug and PR to help to solve the issue.

dvdplm commented 4 years ago

2. After both 2.5.13 and 2.7.2 finished the initial state download and were running full sync, I redirected our system to run against them to see how is the performance of trace_replayTransaction. What we basically do is simply replaying all transactions in blocks and asking for stateDiff only. The 2.5.13 nodes were way faster. I didn't do the math, but visually we were getting all state diffs like ~15 seconds per block with 2.7.2 v.s. less than 10 seconds per block with 2.5.13. (we experimented with both for like 2 hours so it should not really be that just transaction counts or gas cost in blocks are uneven)

Do I understand you correctly that here the two nodes are not archive nodes and not tracing, just plain old vanilla full nodes? I have benchmarked an archive+tracing node at both 2.5 and 2.7 today and I have not found any significant performance difference but my node is not fully synced. I have ~2.5 million blocks in the DB. Here are some numbers running trace_replayBlockTransactions for block 0x1a3d52 which returns a big trace (more than 50Mb) on both versions:

2.7, single big trace, single thread
------------------------------------
    $ wrk --latency -c 1 -d 1m -t 1 --timeout 5 -s ./trace.lua http://127.0.0.1:8545
    Running 1m test @ http://127.0.0.1:8545
      1 threads and 1 connections
      Thread Stats   Avg      Stdev     Max   +/- Stdev
        Latency     1.08s   111.18ms   1.42s    74.55%
        Req/Sec     0.22      0.42     1.00     78.18%
      Latency Distribution
         50%    1.06s
         75%    1.16s
         90%    1.25s
         99%    1.42s
      55 requests in 1.00m, 2.72GB read
    Requests/sec:      0.92
    Transfer/sec:     46.39MB

2.5, single big trace, single thread
------------------------------------
    $ wrk --latency -c 1 -d 1m -t 1 --timeout 5 -s ./trace.lua http://127.0.0.1:8545
    Running 1m test @ http://127.0.0.1:8545
      1 threads and 1 connections
      Thread Stats   Avg      Stdev     Max   +/- Stdev
        Latency     1.15s   104.33ms   1.43s    65.38%
        Req/Sec     0.02      0.14     1.00     98.08%
      Latency Distribution
         50%    1.14s
         75%    1.23s
         90%    1.29s
         99%    1.43s
      52 requests in 1.00m, 2.57GB read
    Requests/sec:      0.87
    Transfer/sec:     43.83MB

Do you think you could run time curl -H 'Content-type: application/json' -X POST -d '{"method":"trace_replayBlockTransactions","params":["0x1a3d52", ["stateDiff", "vmTrace","trace"] ],"id":1,"jsonrpc":"2.0"}' http://localhost:8545 > out-2.7.json on your 2.5 and 2.7 nodes and report back what timings you get? It's the same block I used with wrk above and here I see response times between 1.3 and 1.8 seconds on both versions, consistent with the results of wrk above. Thanks!

EDIT: I ran my benchmark with --mode offline to reduce the noise on the node.

richardpringle commented 4 years ago

Hey @dvdplm, I'm working with @sammy1991106 on this. While we wait for him to get you those numbers, I will say that one of the main concerns is performance under load. It seems as though 2.5 was able to better manage a fleury of trace_replayTransaction requests while it was syncing. It appears as though 2.5 is able to parallelize more work than 2.7. I'll have to take another look at the changes between 2.5 and 2.7 and see if I have any suspicions about what could have affected this.

In other words, I suspect that a single request on a single cpu while in offline mode will not yield a significant difference. @sammy1991106, I'll let you prove or disprove my hypothesis.

richardpringle commented 4 years ago

dvdplm commented 4 years ago

I'll have to take another look at the changes between 2.5 and 2.7 and see if I have any suspicions about what could have affected this.

Lovely, this definitely could use more eyes. One place to look is the switch to jsonrpc v14 and the changes related to threading, e.g. this. During our testing we did not see any regressions but we did not test on tracing nodes.

In other words, I suspect that a single request on a single cpu while in offline mode will not yield a significant difference.

I agree; what I was after is a confirmation that what I'm seeing on my system matches up with what you guys are seeing. If otoh we have a severe regression on a single query on a single thread I'd be more inclined to look into the database layer.

richardpringle commented 4 years ago

Have to re-sync to get the block 0x1a3d52, but in the meantime, I'll post some "benchmarks" I ran with the state that we had.

Block: 0x9146a2

$ time curl -H 'Content-type: application/json' -X POST -d '{"method":"trace_replayBlockTransactions","params":["'$BLOCK'", ["stateDiff", "vmTrace","trace"] ],"id":1,"jsonrpc":"2.0"}' http://localhost:9091 -o $VERSION.json
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 27.2M  100 27.2M  100   122  13.3M     59  0:00:02  0:00:02 --:--:-- 13.3M
real    0m2.130s
user    0m0.005s
sys 0m0.035s

$ echo $BLOCK $IMAGE
0x9146a2 parity-2.5.13

$ time curl -H 'Content-type: application/json' -X POST -d '{"method":"trace_replayBlockTransactions","params":["'$BLOCK'", ["stateDiff", "vmTrace","trace"] ],"id":1,"jsonrpc":"2.0"}' http://localhost:9091 -o $VERSION.json
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 27.2M  100 27.2M  100   122  4716k     20  0:00:06  0:00:05  0:00:01 7521k
real    0m5.994s
user    0m0.008s
sys 0m0.037s

$ echo $BLOCK $IMAGE
0x9146a2 parity-2.6.8

$ time curl -H 'Content-type: application/json' -X POST -d '{"method":"trace_replayBlockTransactions","params":["'$BLOCK'", ["stateDiff", "vmTrace","trace"] ],"id":1,"jsonrpc":"2.0"}' http://localhost:9091 -o $VERSION.json
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 27.2M  100 27.2M  100   122  11.0M     49  0:00:02  0:00:02 --:--:-- 11.0M
real    0m2.557s
user    0m0.004s
sys 0m0.039s

$ echo $BLOCK $IMAGE
0x9146a2 parity-2.7.2

and block 0x914699

$ time curl -H 'Content-type: application/json' -X POST -d '{"method":"trace_replayBlockTransactions","params":["'$BLOCK'", ["stateDiff", "vmTrace","trace"] ],"id":1,"jsonrpc":"2.0"}' http://localhost:9091 -o $VERSION.json
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 17.9M  100 17.9M  100   122  4248k     28  0:00:04  0:00:04 --:--:-- 4248k
real    0m4.396s
user    0m0.006s
sys 0m0.026s

$ echo $BLOCK $IMAGE
0x914699 parity-2.5.13

$ time curl -H 'Content-type: application/json' -X POST -d '{"method":"trace_replayBlockTransactions","params":["'$BLOCK'", ["stateDiff", "vmTrace","trace"] ],"id":1,"jsonrpc":"2.0"}' http://localhost:9091 -o $VERSION.json
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 17.9M  100 17.9M  100   122  4360k     28  0:00:04  0:00:04 --:--:-- 4360k
real    0m4.274s
user    0m0.006s
sys 0m0.023s

$ echo $BLOCK $IMAGE
0x914699 parity-2.6.8

$ time curl -H 'Content-type: application/json' -X POST -d '{"method":"trace_replayBlockTransactions","params":["'$BLOCK'", ["stateDiff", "vmTrace","trace"] ],"id":1,"jsonrpc":"2.0"}' http://localhost:9091 -o $VERSION.json
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 17.9M  100 17.9M  100   122  5255k     34  0:00:03  0:00:03 --:--:-- 5255k
real    0m3.576s
user    0m0.010s
sys 0m0.023s

$ echo $BLOCK $IMAGE
0x914699 parity-2.7.2

I noticed that if I run the query twice, I get a significant speed-up, so obviously there's some caching going on so I'm not sure that running a proper load test really gives useful results beyond the 99 percentile.

@dvdplm, I'll do the same thing for the block you were using too.

richardpringle commented 4 years ago

Here are the "benchmarks" with block 0x1a3d52:

$ time curl -H 'Content-type: application/json' -X POST -d '{"method":"trace_replayBlockTransactions","params":["'$BLOCK'", ["stateDiff", "vmTrace","trace"] ],"id":1,"jsonrpc":"2.0"}' http://localhost:9091 -o $VERSION-$BLOCK.json
Warning: The file name argument '-0x1a3d52.json' looks like a flag.
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 50.6M  100 50.6M  100   122  15.4M     37  0:00:03  0:00:03 --:--:-- 15.4M
real    0m3.279s
user    0m0.008s
sys 0m0.066s

$ echo $BLOCK $IMAGE
0x1a3d52 parity-2.5.13

$ time curl -H 'Content-type: application/json' -X POST -d '{"method":"trace_replayBlockTransactions","params":["'$BLOCK'", ["stateDiff", "vmTrace","trace"] ],"id":1,"jsonrpc":"2.0"}' http://localhost:9091 -o $VERSION-$BLOCK.json
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 50.6M  100 50.6M  100   122   9.9M     24  0:00:05  0:00:05 --:--:-- 13.0M
real    0m5.090s
user    0m0.008s
sys 0m0.066s

$ echo $BLOCK $VERSION
0x1a3d52 parity-2.6.8

$ time curl -H 'Content-type: application/json' -X POST -d '{"method":"trace_replayBlockTransactions","params":["'$BLOCK'", ["stateDiff", "vmTrace","trace"] ],"id":1,"jsonrpc":"2.0"}' http://localhost:9091 -o $VERSION-$BLOCK.json
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 50.6M  100 50.6M  100   122  16.8M     40  0:00:03  0:00:02  0:00:01 16.8M
real    0m3.004s
user    0m0.008s
sys 0m0.068s

$ echo $BLOCK $VERSION
0x1a3d52 parity-2.7.2

Interestingly enough, I decided to run that last query a few more times to see if there was any speed up and it didn't improve by much, below is the fastest attempt:

$ time curl -H 'Content-type: application/json' -X POST -d '{"method":"trace_replayBlockTransactions","params":["'$BLOCK'", ["stateDiff", "vmTrace","trace"] ],"id":1,"jsonrpc":"2.0"}' http://localhost:9091 -o $VERSION-$BLOCK.json
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 50.6M  100 50.6M  100   122  20.1M     48  0:00:02  0:00:02 --:--:-- 20.1M
real    0m2.662s
user    0m0.012s
sys 0m0.071s

$ echo $BLOCK $VERSION
0x1a3d52 parity-2.7.2

Since the original posting of this issue, we have found optimizations related to our trace_replayTransaction strategy, however, that does not help us if the node eventually stops syncing. As we noticed issues related to concurrency and there have been some changes in thread management, it almost seems like our nodes are deadlocking during the syncing process (as they always tend to stop syncing altogether for 2.7.2.). Is this possible?

dvdplm commented 4 years ago

While it's possible you're seeing a deadlock I would be more inclined to suspect thread starvation, where too small a pool of threads is continuously busy answering RPCs requests rather than making progress on sync. That's a guess though at this stage.

richardpringle commented 4 years ago

We actually saw the syncing issue while the rpc server was off.

kryvel commented 4 years ago

Same issue here Parity Ethereum Client. version Parity-Ethereum/v2.7.2-stable-2662d1925-20200206/x86_64-unknown-linux-gnu/rustc1.39.0

RPC server is on, but not used. Initial download of blockchain took more than 2 weeks... Now I'm on 8455500 with 2 blk/s

kcanty commented 4 years ago

Also have this issue - Is there any way to revert the DB change and go back to 2.5.13 with the data? Early testing did not show this bug but now due to a stuck sync and getting behind it is a major issue as 2.7.2 cannot catch back up. @dvdplm or anyone?

any help or suggestions is greatly appreciated.

ordian commented 4 years ago

@kcanty we're investigating the problem at the moment.

In the meantime your can you this tool to downgrade the database from 2.7 to 2.5.

or2008 commented 4 years ago

any updated on this one?

ianhe8x commented 4 years ago

I got same issue. I have two nodes running on aws, one is a m5large vm and the other is run in a kubernetes cluster. The vm one syncs fine, and i used its disk snapshot to create the other node in kubernetes. The container node when using v2.7.2, can never catch up with latest, with mostly 0.00 blk/s. (having a lot of GetNodeData slow warnings) Then I downgrade its db to 2.5 and run it with v2.5.13, it became around 0.6 blk/s to 1blk/s.

jiangxjcn commented 4 years ago

I got the same question with parity 2.7.2. After resync with parity 2.5.3(I start a new parity client at the same machine) , everything seems to be good.

ianhe8x commented 4 years ago

update from my side

I got same issue. I have two nodes running on aws, one is a m5large vm and the other is run in a kubernetes cluster. The vm one syncs fine, and i used its disk snapshot to create the other node in kubernetes. The container node when using v2.7.2, can never catch up with latest, with mostly 0.00 blk/s. (having a lot of GetNodeData slow warnings) Then I downgrade its db to 2.5 and run it with v2.5.13, it became around 0.6 blk/s to 1blk/s.

Update from my side: My problem is because I/O, disk recovered from snapshot need to be warming-up to gain its full IOPS. After warming-up, my node can synced to latest block now. Anyway, I think the 2.7.2 version is definetely slower than 2.5.x.

CherryDT commented 4 years ago

I have the same issue with Parity 2.7.2. It started misbehaving all of a sudden, and it syncs extremely slowly and has "slow fetch" messages as well.

I'm a bit afraid of downgrading though because I don't want to make the problem worse, I need this node to be working as soon as I can... I'm also worried that get.parity.io seems to no longer exist; Parity was installed on this machine using bash <(curl https://get.parity.io -Lk) -r stable, so if I mess up I may not even be able to easily fix the mess afterwards. It's on Ubuntu.

What is the general advice in this case? Is the downgrade "safe"?

kfo2010 commented 4 years ago

2.7.2 version... was synced and running fine for weeks... then all of a sudden for no apparent reason... excessive...

2020-04-30 08:25:49 UTC IO Worker #1 WARN sync 9 -> GetNodeData: item 69/384 – slow state fetch for hash 0x60ffe6e056a92b05df4a3de978857e5bdf80c8110165a260df9d092aadeec00d; took Instant { tv_sec: 911, tv_nsec: 784678089 } 2020-04-30 08:25:55 UTC IO Worker #3 WARN sync 9 -> GetNodeData: item 159/384 – slow state fetch for hash 0xa429e294c3388463f448a8ac1c45ba461fbaa0fa3f1151d67b17b9f98f7bcf76; took Instan t { tv_sec: 917, tv_nsec: 98440592 } 2020-04-30 08:25:55 UTC IO Worker #3 WARN sync 9 -> GetNodeData: item 54/384 – slow state fetch for hash 0x048003563c0c3a3ae64ffd0261affa50d2c1cde1a251fd8ee025e49d370ead56; took Instant { tv_sec: 917, tv_nsec: 438773547 } 2020-04-30 08:25:57 UTC IO Worker #3 WARN sync 9 -> GetNodeData: item 5/384 – slow state fetch for hash 0xdf6a6790a8d7bf0fda2b38aaa5b48aa870e02438e32874fb1ddd40e6841ead4a; took Instant { tv_sec: 919, tv_nsec: 108621033 } 2020-04-30 08:25:57 UTC IO Worker #0 WARN sync 9 -> GetNodeData: item 11/384 – slow state fetch for hash 0x85a0e528b093e7f7dd027c0f743f6dd65d74d2d7129a7d659963cc559c541539; took Instant { tv_sec: 919, tv_nsec: 544842060 } 2020-04-30 08:25:59 UTC IO Worker #1 WARN sync 9 -> GetNodeData: item 218/384 – slow state fetch for hash 0x6796bd131c3245f4dcf2d3ffa589a0aa2399c822a054ccb9751ad4a605be1b53; took Instan t { tv_sec: 920, tv_nsec: 869450369 }

fish-sammy commented 4 years ago

Just an update, we are still having the same issue with open-ethereum 3.0.0 with the following configuration

# This config should be placed in following path:
#   ~/.local/share/io.parity.ethereum/config.toml

[parity]
# Ethereum Mainnet Network
chain = "mainnet"
# Parity continously syncs the chain
mode = "active"
# No updates will be auto-installed
auto_update = "none"
# Disables auto downloading of new releases. Not recommended.
no_download = true
# Blockchain and settings will be stored in /data.
base_path = "/data"

[rpc]
# Allows Cross-Origin Requests from domain '*'.
cors = ["*"]
#  JSON-RPC will be listening for connections on IP all.
interface = "all"
# Only selected APIs will be exposed over this interface.
apis = ["web3", "eth", "net", "traces"]
# Allow connections only using specified addresses.
hosts = ["all"]
# JSON-RPC over HTTP will be accessible on port 9091.
port = 9091
# Threads for handling incoming connections for HTTP JSON-RPC server.
server_threads = 16

[websockets]
#  JSON-RPC will be listening for connections on IP all.
interface = "all"
# Allows connecting from Origin 'all'.
origins = ["all"]
# Only selected APIs will be exposed over this interface.
apis = ["pubsub"]
# JSON-RPC over WebSockets will be accessible on port 9092.
port = 9092

[ipc]
# You won't be able to use IPC to interact with Parity.
disable = true

[network]
# Parity will sync by downloading latest state first. Node will be operational in couple minutes.
warp = true
# Parity will maintain additional 25 peers for snapshot sync.
snapshot_peers = 25
# Disable serving light peers.
no_serve_light = true

[footprint]
# Compute and Store tracing data. (Enables trace_* APIs).
tracing = "off"
# Prune old state data. Maintains journal overlay - fast but extra 50MB of memory used.
pruning = "fast"
# Will keep up to 20000 old state entries.
pruning_history = 20000
# If defined will never use more then 32768MB for all caches. (Overrides other cache settings).
cache_size = 4096

[mining]
# Maximum number of transactions per sender in the queue. By default it's 1% of the entire queue, but not less than 16.
tx_queue_per_sender = 256
# Maximum amount of memory that can be used by the transaction queue. Setting this parameter to 0 disables limiting.
tx_queue_mem_limit = 32
# Parity will keep/relay at most 8196 transactions in queue.
tx_queue_size = 8196

jiangxjcn commented 4 years ago

@sammy1991106 To be honest, i think 4096MB for all caches is far from enough to keep pruning-history=20000. Maybe 32768MB is also not enough. I give parity 64000MB and it will use upto 128000MB memory.

fish-sammy commented 4 years ago

@jiangxjcn I would agree if that cache also applies limit on pruning_history, which I am not sure if it does. The same configuration works fine for 2.5.13 though. I only see this issue with 2.7.2 and 3.0.0.

meowsbits commented 4 years ago

Redacted comment since it was rather large and later understood as off topic. See https://github.com/openethereum/openethereum/issues/11494#issuecomment-636523424.

richardpringle commented 4 years ago

@sammy1991106 didn't you notice the opposite (it synced no-problem) on ethereum-classic mainnet?

meowsbits commented 4 years ago

For Ethereum Classic, probably related: https://github.com/openethereum/openethereum/issues/11104

fish-sammy commented 4 years ago

@richardpringle No, I didn't. Both 2.7.2 and 3.0.0 work fine for ethereum-classis's Mainnet and Kotti, as well as ethereum's Ropsten.

webcaetano commented 4 years ago

same issue here

or2008 commented 4 years ago

FYI just upgraded my 2.6.8 to 3.0.1 and from 1 miserable peer it currently syncing with 17/25 peers..

phiresky commented 4 years ago

For me, 2.7.2 just completely stops syncing every few days. The server still runs and RPC calls are still answered, but it just doesn't get any new blocks. Even though it shows 10+ connections. I have full debug logs of a few instances, but they don't seem to say anything interesting.

I think it got better when I enabled --no-periodic-snapshot (now only once per week), but it still happens. I wrote a script that just kills and restarts the parity node when it's out of date by more than 10 minutes, that works..

Edit: probably more relevant to #11737

avolkov-dev commented 4 years ago

I saw thie errors on OpenEthereum:3.0.1 Any ideas how to resolve this?

webcaetano commented 4 years ago

@avolkov-dev this help https://github.com/openethereum/openethereum/issues/11494#issuecomment-639457437

avolkov-dev commented 4 years ago

Will try thanks!

kcanty commented 4 years ago

@avolkov-dev yes the script from @phiresky works well for us - in production we ensure the script is running with monit and have the systemd service file for parity set to restart always.