mimblewimble / grin

Minimal implementation of the Mimblewimble protocol.
https://grin.mw/
Apache License 2.0
5.04k stars 990 forks source link

Node will freeze sporadically after a few days to a week of running #3725

Open jaw709 opened 2 years ago

jaw709 commented 2 years ago

Description:

After successful installation and syncing of grin node, client runs without incident until consistently (multiple new installs) beginning to "freeze" after a few days to a week.

Install guide:

https://github.com/mimblewimble/grin/blob/master/doc/build.md

hardware:

Raspberry pi 4b 4GB RAM Installed on SD card apart from OS on USB

To Reproduce Steps to reproduce the behavior:

  1. Run '...' ./grin
  2. Expect: After a few days to a week to find terimnal hung
  3. See error: Logs attached

Relevant Information Replacing the "Main" folder from backup from first sync resolves for ~week

Screenshots If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

Additional context I have tried the node on multiple mediums and file formats. Storing data on USB versus SD card, FAT32, Ext4, btrfs etc. The same occurs consistently; observing while running, I have not seen any resource problems with RAM or CPU. After replacing main folder, or restarting sometimes, it resumes as normal. Thank you.

grin-server-backup625.log

jaw709 commented 2 years ago

I believe I'm coming closer to isolating the pertinent error. I realized that the logs print in OS set time, while the node displays UTC. When I searched for the correct timestamp, it seems it could be related to hashsheet compaction. Please see attached. PXL_20220709_044929021 MP PXL_20220709_044138984 MP grin-server.log.2.gz grin-server-logs-all.zip

jaw709 commented 2 years ago

Just posting latest update... Froze again within one minute of tx hashset compaction. NOTE: node is four hours ahead with UTC time, logs prints in EST.

grin-server-714.log

PXL_20220714_181840956

jaw709 commented 2 years ago

Testing on Mainnet PIBD_impl branch intially did seem to work, however the freezing returned within one minute before compaction began. Logs attached aug22-SSandLogs.zip