lbryio / lbcd

An alternative full node implementation of LBRY's blockchain written in Go (golang)
https://lbry.com/
ISC License
39 stars 26 forks source link

`getblocktemplate` corrupted claimtrie database #71

Open roylee17 opened 2 years ago

roylee17 commented 2 years ago

Failed to create new block template: in reset height: unable to restore the hash at height 1193913

unknown

Waiting for feedback if the database is recoverable using reconsiderblock blockhash

BrannonKing commented 2 years ago

What version? It would be interesting to know if this is related to the node cache or not. I thought that I had it so that it wouldn't modify the database from getblocktemplate (in current code).

roylee17 commented 2 years ago

Database "corruption" might not be accurate. It maybe just the block marked as invalided in the database.

It happened on v0.22.102, which is a pretty recent commit (2022-07-06)

And it was recoverable by

# block 1193914
lbcctl reconsiderblock 2c3cd68739a5f2ef4c8dda71024e608f0898a67ef1210962f3454ce011671bc4

From the log below, the ClaimTrie root calculation mismatches the actual hash in the header. Not sure the claimtrie was fully rebuilt (in the claimtrie.ResetHeight) or not in this case though.

2022-07-17 05:30:28.145 [INF] SYNC: Processed 1 block in the last 1m25.01s (26 transactions, height 1193912, 2022-07-17 05:30:13 -0400 EDT)
2022-07-17 05:32:45.918 [INF] SYNC: Processed 1 block in the last 2m17.77s (197 transactions, height 1193913, 2022-07-17 05:32:32 -0400 EDT)
2022-07-17 05:36:59.997 [ERR] RPCS: Failed to create new block template: in reset height: unable to restore the hash at height 1193913
2022-07-17 05:37:21.195 [ERR] RPCS: Failed to create new block template: in reset height: unable to restore the hash at height 1193913
2022-07-17 05:37:22.017 [ERR] RPCS: Failed to create new block template: in reset height: unable to restore the hash at height 1193913
2022-07-17 05:38:24.264 [ERR] RPCS: Failed to create new block template: in reset height: unable to restore the hash at height 1193913
2022-07-17 05:39:26.103 [ERR] RPCS: Failed to create new block template: in reset height: unable to restore the hash at height 1193913
2022-07-17 05:39:55.804 [INF] MAIN: RAM: using 6.0 GB with 18.3 available, DISK: using 127.3 GB with 89.8 available
2022-07-17 05:40:28.039 [ERR] RPCS: Failed to create new block template: in reset height: unable to restore the hash at height 1193913
2022-07-17 05:41:29.656 [ERR] RPCS: Failed to create new block template: in reset height: unable to restore the hash at height 1193913
2022-07-17 05:41:58.625 [INF] SYNC: Rejected block 2c3cd68739a5f2ef4c8dda71024e608f0898a67ef1210962f3454ce011671bc4 from 147.135.15.197:9246 (outbound): height: 1193914, computed hash: 6a274f874d2ceb112fbeec50e649919edb1b10c291c83ec29b8723036bbab5e3 != header's ClaimTrie: 5152b64862af9eefd161ddccb0f0eb663b9c65d4c5a44728371ea43bf17cac26
2022-07-17 05:43:05.475 [INF] SYNC: Rejected block 9c007d3d7cc47d0dc4aac3fdce6d5b11e46050fbaac1b72f2da0a95c98246ed3 from 147.135.15.197:9246 (outbound): previous block 2c3cd68739a5f2ef4c8dda71024e608f0898a67ef1210962f3454ce011671bc4 is known to be invalid
2022-07-17 05:43:05.531 [INF] SYNC: Rejected block 9c007d3d7cc47d0dc4aac3fdce6d5b11e46050fbaac1b72f2da0a95c98246ed3 from 51.81.34.141:9246 (outbound): previous block 2c3cd68739a5f2ef4c8dda71024e608f0898a67ef1210962f3454ce011671bc4 is known to be invalid
2022-07-17 05:43:06.090 [INF] SYNC: Rejected block 9c007d3d7cc47d0dc4aac3fdce6d5b11e46050fbaac1b72f2da0a95c98246ed3 from 47.52.140.89:9246 (outbound): previous block 2c3cd68739a5f2ef4c8dda71024e608f0898a67ef1210962f3454ce011671bc4 is known to be invalid
2022-07-17 05:43:06.624 [INF] SYNC: Rejected block 9c007d3d7cc47d0dc4aac3fdce6d5b11e46050fbaac1b72f2da0a95c98246ed3 from 47.91.158.151:9246 (outbound): previous block 2c3cd68739a5f2ef4c8dda71024e608f0898a67ef1210962f3454ce011671bc4 is known to be invalid
2022-07-17 05:43:07.441 [INF] SYNC: Rejected block 9c007d3d7cc47d0dc4aac3fdce6d5b11e46050fbaac1b72f2da0a95c98246ed3 from 47.75.9.191:9246 (outbound): previous block 2c3cd68739a5f2ef4c8dda71024e608f0898a67ef1210962f3454ce011671bc4 is known to be invalid
2022-07-17 05:45:00.050 [INF] CHAN: Adding orphan block 021a6c35dc168ce004da137c0366ad96dfb42ee66d7bc5fdc2a56ff5f8460df0 with parent 9c007d3d7cc47d0dc4aac3fdce6d5b11e46050fbaac1b72f2da0a95c98246ed3
roylee17 commented 2 years ago

Some update:

It just happened again on block 1194911 (07/20/2022). We're restarting it with v0.22.104 to see if the RamTrie was fully or partially rebuilt when in case it happens again. And when it happens again before we fixing it, we'll revert to v0.22.100-rc2 to exclude the node cache optimization and see if that help narrow down the scope.

BrannonKing commented 2 years ago

The node-cache optimization was a substantial speedup on the getblocktemplate call. It might be worth it to test the single-threaded version (with the node cache).

roylee17 commented 2 years ago

We'll set up a small long-running node dedicated to getblocktemplate related stuff including this one (single-threaded version).

roylee17 commented 2 years ago

Can you rebase that change to the current master, and test it in your environment or the CI.

The following was my attempt to rebase, but it runs twice as longer rebuilding the RamTrie (3min vs 6min) on my machine. It could be me dropping something during the merge.

https://github.com/lbryio/lbcd/tree/single_thread_cache-rebased

BrannonKing commented 2 years ago

Your rebase looks okay; the RamTrie rebuild is slower in the single-threaded case. That's expected.