mimblewimble / grin

Minimal implementation of the Mimblewimble protocol.
https://grin.mw/
Apache License 2.0
5.04k stars 990 forks source link

TxHashsetDownload is starting in parallel with TxHashsetPibd #3754

Open ardocrat opened 1 year ago

ardocrat commented 1 year ago

Describe the bug After some time at TxHashsetPibd step synchronization is switching to non-PIBD TxHashsetDownload step in parallel, then coming back to TxHashsetPibd, sometimes it can lead to hang of synchronization.

Screenshots Screenshot_20230615_144833

At logs I am seeing: Screenshot_20230615_144709

Desktop (please complete the following information):

Anynomouss commented 1 year ago

For clarity, this issue is with the git v5.2.0-beta.1 My guess would be that this happens when a new horizon header is defined. The horizon header is 48 hours back in time, but is updated every 12 hours. So it would be important to look at the time, probably it coincides with this 12 hourly horizon update.

Just a wild guess, but it happens at block header height 2414175 according to the log.

2414175/(1260) = 3353.02083333, so 0.0212*60 = 14.4. Your screenshot is from 14 minutes after a horizon update. Can be coincidence, most likely it is not. Probably the node gets into some conflict state since it starts to sync using the txHAshsetDownload since it updated all blocks except the last 48 hours using PIDB. Then, before the node has finished downloading the next 12 hours using the TxHashsetDownlaod method, the horizon is updated, resulting in a switch back to PIDB to download the remaining blocks to the new horizon header.

yeastplume commented 1 year ago

I'm looking into this as well, would be good to narrow down how to reproduce it, let me know if you determine anything further.

cekickafa commented 1 year ago

Desktop OS: Windows 10 Version 5.2.0-beta

pibd2

20230616 00:30:40.927 INFO grin_servers::grin::sync::state_sync - No PIBD-enabled max-difficulty peers for the past 60 seconds - Aborting PIBD and falling back to TxHashset.zip download 20230616 00:30:40.938 ERROR grin_servers::grin::sync::state_sync - state_sync: error = Aborting PIBD error. restart fast sync 20230616 00:30:48.719 INFO grin_servers::grin::sync::state_sync - No PIBD-enabled max-difficulty peers for the past 60 seconds - Aborting PIBD and falling back to TxHashset.zip download 20230616 00:30:48.731 ERROR grin_servers::grin::sync::state_sync - state_sync: error = Aborting PIBD error. restart fast sync 20230616 00:30:56.449 INFO grin_servers::grin::sync::state_sync - No PIBD-enabled max-difficulty peers for the past 60 seconds - Aborting PIBD and falling back to TxHashset.zip download 20230616 00:30:56.465 ERROR grin_servers::grin::sync::state_sync - state_sync: error = Aborting PIBD error. restart fast sync 20230616 00:31:04.387 INFO grin_servers::grin::sync::state_sync - No PIBD-enabled max-difficulty peers for the past 60 seconds - Aborting PIBD and falling back to TxHashset.zip download 20230616 00:31:04.399 ERROR grin_servers::grin::sync::state_sync - state_sync: error = Aborting PIBD error. restart fast sync 20230616 00:31:12.606 INFO grin_servers::grin::sync::state_sync - No PIBD-enabled max-difficulty peers for the past 60 seconds - Aborting PIBD and falling back to TxHashset.zip download 20230616 00:31:12.617 ERROR grin_servers::grin::sync::state_sync - state_sync: error = Aborting PIBD error. restart fast sync 20230616 00:31:20.554 INFO grin_servers::grin::sync::state_sync - No PIBD-enabled max-difficulty peers for the past 60 seconds - Aborting PIBD and falling back to TxHashset.zip download 20230616 00:31:20.574 ERROR grin_servers::grin::sync::state_sync - state_sync: error = Aborting PIBD error. restart fast sync 20230616 00:31:30.961 ERROR grin_p2p::protocol - handle_payload: txhashset archive received but SyncStatus not on TxHashsetDownload 20230616 00:36:51.112 INFO grin - This is Grin version 5.2.0-alpha.2 (git v5.2.0-beta.1), built for x86_64-pc-windows-msvc by rustc 1.70.0 (90c541806 2023-05-31). 20230616 00:36:51.112 INFO grin - Chain: Mainnet 20230616 00:36:51.112 INFO grin - Accept Fee Base: 500000 20230616 00:36:51.112 INFO grin - Future Time Limit: 300 20230616 00:36:51.112 INFO grin - Feature: NRD kernel enabled: false 20230616 00:36:51.112 WARN grin::cmd::server - Starting GRIN in UI mode... 20230616 00:36:51.113 INFO grin_servers::grin::server - Starting server, genesis block: 40adad0aec27 20230616 00:36:54.202 INFO grin_servers::grin::server - Starting rest apis at: 127.0.0.1:3413 20230616 00:36:54.203 WARN grin_api::handlers - Starting HTTP Node APIs server at 127.0.0.1:3413. 20230616 00:36:54.203 WARN grin_api::handlers - HTTP Node listener started. 20230616 00:36:54.203 INFO grin_servers::grin::server - Starting dandelion monitor: 127.0.0.1:3413 20230616 00:36:54.203 WARN grin_servers::grin::server - Grin server started. 20230616 00:36:54.203 INFO grin_servers::common::types - DandelionEpoch: next_epoch: is_stem: true (90%), relay: None 20230616 00:39:05.610 WARN grin::tui::ui - Shutdown in progress, please wait 20230616 00:39:05.706 INFO grin_api::rest - API server has been stopped 20230616 00:39:06.243 INFO grin_servers::grin::server - connect_and_monitor thread stopped 20230616 00:39:06.817 INFO grin_servers::grin::server - sync thread stopped 20230616 00:39:06.817 INFO grin_servers::grin::server - dandelion_monitor thread stopped 20230616 00:39:09.038 ERROR grin_p2p::peers - connected_peers: failed to get peers lock 20230616 00:39:09.038 WARN grin_servers::grin::server - Shutdown complete

yeastplume commented 1 year ago

https://github.com/mimblewimble/grin/pull/3757 probably fixes this, would appreciate if anyone seeing this issue could build from that PR and see if the same error occurs.