lbryio / lbrycrd

The blockchain that provides the digital content namespace for the LBRY protocol
https://lbry.com
MIT License
2.57k stars 178 forks source link

Fix a crash bug on node restart when blocks are disconnected #109

Closed lbrynaut closed 6 years ago

lbrynaut commented 6 years ago

While looking at related parts, I ran into this uninitialized iterator bug. The expiration queue is empty on restart, so disconnecting a block forces a crash since we're using an uninitialized iterator in that case.

kaykurokawa commented 6 years ago

Hmm.. this shouldn't happen I think and means that there's some deeper problem. If itQueueRow != expirationQueueCache.end() as in line 1675 that means that we can't find the expiration entry for a claim which should always exist. If not in cache, it should be able to look into disk and find it CClaimTrieCache::getExpirationQueueCacheRow() will call CClaimTrie::getExpirationQueueRow() which will look on disk for entry if needed)

How would I reproduce this error, would the below step work?

1)on regtest make a claim 2) increment block 2)Exit lbrycrdd 3)Decrement the block

Thanks for finding this.

kaykurokawa commented 6 years ago

hmm couldn't reproduce this using the steps above, (used this script: https://gist.github.com/kaykurokawa/bdecc79ac36b3a8af148fb43bdaefb01 )

I wonder if this is something related to expirations not working properly...

lbrynaut commented 6 years ago

This was found/fixed using mainnet master (no modifications). It's possible it's related to the chain movement of the time, but I'm fairly certain that if I re-build master, it won't be too long before I can hit this again. If I do, I'll post the stack trace.

lbrynaut commented 6 years ago

There was a time, every startup crashed with this, across multiple rebuilds and syncs. Today, it's not showing this behaviour. I suspect it's a real bug (and the code is definitely unchecked/unsafe) without the fix, but I'm ok kicking the can on this for now.

kaykurokawa commented 6 years ago

I think adding additional tests for claim expiration in https://github.com/lbryio/lbrycrd/pull/112 may reveal something, I will work on that.

kaykurokawa commented 6 years ago

I will develop a test for this to reproduce on the block height around the time lbrynaut ran into this problem.

BrannonKing commented 6 years ago

Come to find out: this change was already checked in back in March 2018. I vote we close this PR. I think what we were seeing was just the affect of the way it's coded: it uses a remove-the-old-if-it-exists-before-inserting-the-new approach. I don't think that's a problem.

lbrynaut commented 6 years ago

Yes, the change was probably applied in one of the other branches that I built on top of. It fixes an issue, but we're not sure it alone fixes the related issue documented here (see #134 also).