Open Mattiabe98 opened 3 years ago
I can easily reproduce the issue on two setups (main and failover node) by simply switching them to "validator" mode (making them use a validator's keystore). When in "fullnode" mode, thus not validating, I don't see the huge amount of disk writes. This could maybe help narrow down the issue..?
As you can see from the pic, this is the amount of data that the Radixdlt node software wrote to the disk in just 5 minutes. 8GB in 5 minutes is 96GB/hour and 2.3TB/day. The writes appear to come in bursts, you can see this behavior in this gif.
The writes also seem highly related to the CPU usage: (they're all writes even though the legend says reads)
This is the SMART report of a new drive that has been used for just a week on a backup node for my Radix validator. 3% wearout in a week..
fatrace -c -f W running for just a few seconds shows lots of writes in the .jdb files.
This is the status of my data folder with the Radix DB.
Following this Radix docs I created a je.properties file (before the first sync and never changed after).
radixSSD.zip
The attached radixSSD.zip file contains all my je.properties, je.stat, je.info, je.config files for troubleshooting. I've had this issue happen on two different servers, one selfhosted and one on Hetzner, both dedicated servers running Proxmox and the Radix software inside LXC containers. Disabling the archive endpoint has no effect on the disk I/O. I also tried changing the logging to debug for more info but I got nothing useful. Changing to error also had no change.
If any other kind of information is needed please don't hesitate to ask. Thank you.