kaspanet / rusty-kaspa

Kaspa full-node reference implementation and related libraries in the Rust programming language
ISC License
424 stars 135 forks source link

testnet11 DbError Corruption:block checksum mismatch #507

Closed yq0810 closed 2 weeks ago

yq0810 commented 1 month ago

Describe the bug testnet11 DbError Corruption:block checksum mismatch

To Reproduce Steps to reproduce the behavior: cargo run --bin kaspad --release -- --testnet --netsuffix=11 --utxoindex

Screenshots image

Desktop

Additional context I tried deleting ~/.rust-kaspa/kaspa-testnet-11 and the same thing happened. There will be no errors when running mainnet.

aspect commented 1 month ago

It sounds like you have a hardware problem that exhibits itself under pressure / hardware stress. tn11 is very demanding on IO, while currently the mainnet isn’t. I have previously experienced this on physical hardware (glitchy IO chipset) and on virtualization platforms like VirtualBox. Hardware problems like RAM corruption and IO are typically exhibited under high utilization.

Actual db corruptions can occur (extremely rarely as a consequence of an ungraceful application termination or an ungraceful system restart). No other users are reporting this so the issue most likely lays with IO in your system (that you otherwise don’t see unless you heavily stress random writes).

yq0810 commented 1 month ago

It sounds like you have a hardware problem that exhibits itself under pressure / hardware stress. tn11 is very demanding on IO, while currently the mainnet isn’t. I have previously experienced this on physical hardware (glitchy IO chipset) and on virtualization platforms like VirtualBox. Hardware problems like RAM corruption and IO are typically exhibited under high utilization.

Actual db corruptions can occur (extremely rarely as a consequence of an ungraceful application termination or an ungraceful system restart). No other users are reporting this so the issue most likely lays with IO in your system (that you otherwise don’t see unless you heavily stress random writes).

This system uses an M2 SSD, it is not a virtual machine, and it is a newly installed system. After installation, the only program running is the KAS node. This is the only program where I encounter errors in the KAS node, and I am quite puzzled by it.

aspect commented 2 weeks ago

You need to try some sort of an IO stress test. Especially if this is new system. I am 99% certain this is due to high IO throughout.

Going to close this if you don’t mind since as I mentioned, you are the only person experiencing this. Feel free to hop on Discord in #development to get further feedback from different devs.