mimblewimble / grin

Minimal implementation of the Mimblewimble protocol.
https://grin.mw/
Apache License 2.0
5.04k stars 990 forks source link

Aborting PIBD error. restart fast sync v5.2.0-beta.3 #3767

Open noobvie opened 1 year ago

noobvie commented 1 year ago

Describe the bug I tried to rebuilt some grin nodes with new version v5.2.0-beta.3. After cleaning up the chain_data, it synced from the scratch and usually got error at step PIBD or stuck in the middle.

To Reproduce Steps to reproduce the behavior:

  1. Run: grin clean
  2. Resync from scratch: grin

Relevant Information Here is example log

20230812 00:16:49.034 INFO grin - Using configuration file at /root/grin/grin-server.toml
20230812 00:16:49.034 INFO grin - This is Grin version 5.2.0-beta.3 (git v5.2.0-beta.3), built for x86_64-unknown-linux-gnu by rustc 1.71.0 (8ede3aae2 2023-07-12).
20230812 00:16:49.035 INFO grin - Chain: Mainnet
20230812 00:16:49.035 INFO grin - Accept Fee Base: 500000
20230812 00:16:49.035 INFO grin - Future Time Limit: 300
20230812 00:16:49.035 INFO grin - Feature: NRD kernel enabled: false
20230812 00:16:49.035 WARN grin::cmd::server - Starting GRIN in UI mode...
20230812 00:16:49.091 INFO grin_servers::grin::server - Starting server, genesis block: 40adad0aec27
20230812 00:16:51.303 INFO grin_servers::grin::server - Starting rest apis at: 127.0.0.1:3413
20230812 00:16:51.310 WARN grin_api::handlers - Starting HTTP Node APIs server at 127.0.0.1:3413.
20230812 00:16:51.312 WARN grin_api::handlers - HTTP Node listener started.
20230812 00:16:51.312 INFO grin_servers::grin::server - Starting dandelion monitor: 127.0.0.1:3413
20230812 00:16:51.312 WARN grin_servers::grin::server - Grin server started.
20230812 00:16:51.388 INFO grin_servers::common::types - DandelionEpoch: next_epoch: is_stem: false (90%), relay: None
20230812 00:16:52.502 INFO grin_servers::common::adapters - Received 32 block headers from 107.174.176.161:3414
20230812 00:16:52.901 INFO grin_servers::common::adapters - Received 32 block headers from 107.174.176.161:3414
20230812 00:16:53.529 INFO grin_servers::common::adapters - Received 32 block headers from 107.174.176.161:3414
20230812 00:16:53.718 INFO grin_servers::common::adapters - Received 32 block headers from 107.174.176.161:3414
20230812 00:16:53.920 INFO grin_servers::common::adapters - Received 32 block headers from 107.174.176.161:3414
20230812 00:16:54.124 INFO grin_servers::common::adapters - Received 32 block headers from 107.174.176.161:3414
20230812 00:16:54.298 INFO grin_servers::common::adapters - Received 14 block headers from 107.174.176.161:3414
20230812 00:25:11.707 ERROR grin_servers::common::adapters - send_block_request_to_peer: failed: Send("try_send disconnected")
20230812 00:26:52.111 INFO grin_servers::common::types - DandelionEpoch: next_epoch: is_stem: true (90%), relay: Some(PeerAddr(103.45.234.223:3414))
20230812 00:29:56.415 INFO grin_servers::grin::sync::state_sync - No PIBD-enabled max-difficulty peers for the past 60 seconds - Aborting PIBD and falling back to TxHashset.zip download
20230812 00:29:56.504 ERROR grin_servers::grin::sync::state_sync - state_sync: error = Aborting PIBD error. restart fast sync
20230812 00:30:27.229 ERROR grin_servers::grin::sync::state_sync - state_sync: send_txhashset_request err! Send("try_send disconnected")
20230812 00:30:27.240 ERROR grin_servers::grin::sync::state_sync - state_sync: error = Sync error. restart fast sync
20230812 00:33:14.721 INFO grin_util::zip - extract_files: "kernel/pmmr_data.bin" -> "/root/grin/tmp/txhashset/kernel/pmmr_data.bin"
20230812 00:33:21.520 INFO grin_util::zip - extract_files: "kernel/pmmr_hash.bin" -> "/root/grin/tmp/txhashset/kernel/pmmr_hash.bin"
20230812 00:33:21.826 INFO grin_util::zip - extract_files: "output/pmmr_data.bin" -> "/root/grin/tmp/txhashset/output/pmmr_data.bin"
20230812 00:33:23.534 INFO grin_util::zip - extract_files: "output/pmmr_hash.bin" -> "/root/grin/tmp/txhashset/output/pmmr_hash.bin"
20230812 00:33:23.620 INFO grin_util::zip - extract_files: "output/pmmr_prun.bin" -> "/root/grin/tmp/txhashset/output/pmmr_prun.bin"
20230812 00:33:32.620 INFO grin_util::zip - extract_files: "rangeproof/pmmr_data.bin" -> "/root/grin/tmp/txhashset/rangeproof/pmmr_data.bin"
20230812 00:33:34.323 INFO grin_util::zip - extract_files: "rangeproof/pmmr_hash.bin" -> "/root/grin/tmp/txhashset/rangeproof/pmmr_hash.bin"
20230812 00:33:34.329 INFO grin_util::zip - extract_files: "rangeproof/pmmr_prun.bin" -> "/root/grin/tmp/txhashset/rangeproof/pmmr_prun.bin"
20230812 00:33:34.332 INFO grin_util::zip - extract_files: "output/pmmr_leaf.bin.00038e6c2509" -> "/root/grin/tmp/txhashset/output/pmmr_leaf.bin.00038e6c2509"
20230812 00:33:34.334 INFO grin_util::zip - extract_files: "rangeproof/pmmr_leaf.bin.00038e6c2509" -> "/root/grin/tmp/txhashset/rangeproof/pmmr_leaf.bin.00038e6c2509"
20230812 00:36:53.044 INFO grin_servers::common::types - DandelionEpoch: next_epoch: is_stem: true (90%), relay: Some(PeerAddr(66.42.124.255:3414))
20230812 00:47:04.172 INFO grin_servers::common::types - DandelionEpoch: next_epoch: is_stem: true (90%), relay: Some(PeerAddr(107.174.186.153:3414))
20230812 00:57:05.297 INFO grin_servers::common::types - DandelionEpoch: next_epoch: is_stem: false (90%), relay: Some(PeerAddr(162.55.75.124:3414))
20230812 01:07:15.974 INFO grin_servers::common::types - DandelionEpoch: next_epoch: is_stem: true (90%), relay: Some(PeerAddr(23.94.107.192:3414))
20230812 01:17:16.608 INFO grin_servers::common::types - DandelionEpoch: next_epoch: is_stem: true (90%), relay: Some(PeerAddr(65.21.40.28:3414))
20230812 01:27:17.597 INFO grin_servers::common::types - DandelionEpoch: next_epoch: is_stem: true (90%), relay: Some(PeerAddr(107.174.93.160:3414))
20230812 01:37:18.311 INFO grin_servers::common::types - DandelionEpoch: next_epoch: is_stem: true (90%), relay: Some(PeerAddr(65.109.21.208:3414))
20230812 01:44:56.473 WARN grin::tui::ui - Shutdown in progress, please wait
20230812 01:44:56.496 INFO grin_servers::grin::server - connect_and_monitor thread stopped
20230812 01:44:56.496 INFO grin_servers::grin::server - sync thread stopped
20230812 01:44:56.496 INFO grin_api::rest - API server has been stopped
20230812 01:44:57.085 INFO grin_servers::grin::server - dandelion_monitor thread stopped

OS version:

Linux grindallastx 5.15.0-78-generic #85-Ubuntu SMP Fri Jul 7 15:25:09 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
root@grindallastx:~/grin# cat /etc/*release
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=22.04
DISTRIB_CODENAME=jammy
DISTRIB_DESCRIPTION="Ubuntu 22.04.3 LTS"
PRETTY_NAME="Ubuntu 22.04.3 LTS"
NAME="Ubuntu"
VERSION_ID="22.04"
VERSION="22.04.3 LTS (Jammy Jellyfish)"
VERSION_CODENAME=jammy
ID=ubuntu
ID_LIKE=debian
ardocrat commented 1 year ago

From logs you have no PIBD peers, so tried to switch to non-PIBD TxHashsetDownload and got an error.

ardocrat commented 1 year ago

I found some weird issue with peers. I restarted my node frequently from single IP address and now after more than 24 hours I can find only 2-3 peers with same IPs.. Moreover, after cleaning and restarting the node I am getting segmentation fault error on launch.

noobvie commented 1 year ago

How to switch to non-PIBD TxHashsetDownload? Is there a parameter to trigger it? I tried at least 5 vps in 3 continents (eu,us,asia) all had issue with PIBD. Took more than one day to have full sync after multiple times to restart grin node .

ardocrat commented 1 year ago

How to switch to non-PIBD TxHashsetDownload? Is there a parameter to trigger it?

It switches automatically when you have no peers with PIBD capability after timeout. I have issue with peers now also, so it can be related to this problem.

noobvie commented 1 year ago

I also see that the CPU usage seemingly consumed much during the step 3-4/7. Maybe to verify the data or package.

ardocrat commented 1 year ago

I also see that the CPU usage seemingly consumed much

Its OK.