lbryio / lbrycrd

The blockchain that provides the digital content namespace for the LBRY protocol
https://lbry.com
MIT License
2.57k stars 178 forks source link

Fix incorrect usage of the claim index #197

Closed lbrynaut closed 6 years ago

lbrynaut commented 6 years ago

Fix incorrect usage of the claim index so that we can properly maintain state through re-orgs.

lbrynaut commented 6 years ago

Props to @shyba, who created #195 with some code that made it easy to track this down.

bvbfan commented 6 years ago

You change behavior lake that,

addIndex
deleteIndex

will result in

deleteIndex
addIndex

Why it is needed, can explain with example, it looks like you fix it in not right direction, about me.

shyba commented 6 years ago

It's working fine so far. No claims missing and fallback yet never triggered. I will keep updating in the following days, but the way it looks now seems fixed. Thank you!

lbrynaut commented 6 years ago

Why it is needed, can explain with example, it looks like you fix it in not right direction, about me.

What I noted there in my comment of how this is applied is just that if a block is disconnected and then re-connected, deleting before adding back is correct. If you add before removing in that case, it's a bug.

@bvbfan What you pointed out is incorrect. In the case that a block is disconnected and then connected, the order is delete, add. If a block is connected and then disconnected, the order is still add, delete. An add in general never prompts a delete, so thinking of them simply being reversed is not right. Also, the actual bug has nothing to do with ordering, that was purely an implementation detail on the fix.

lbrynaut commented 6 years ago

It's working fine so far. No claims missing and fallback yet never triggered. I will keep updating in the following days, but the way it looks now seems fixed. Thank you!

@lyoshenka, the previous claim index code is definitely causing corruption. No worry of a fork, just missing claims from the index (they are still retrievable in other, slower ways). Most users won't notice this, but we should consider a release advising a re-index on the sooner side.

lbrynaut commented 6 years ago

I'll give @kaykurokawa through Monday to review and then I'm going to merge this. @shyba After that, please don't forget to PR the unit test update on top.

kaykurokawa commented 6 years ago

hmm I'm a bit confused here .. what is the issue that is being corrected here? Can you describe in pseudo-code or write a unit test for it?

I was looking at this too and I think there's some problem where we are deciding to add stuff to the claim index and removing from the claim index. Don't we need to add a claim below this line : https://github.com/lbryio/lbrycrd/blob/abd9e02c7c2411a5ecf72634ed4e0b1abf2556e5/src/claimtrie.cpp#L1371 ?

This unit test below (can be added to claimtriebranching_tests.cpp) shows the problem caused by the above line where if we make two claims to the same name, we spend it on the next block, and than decrement that block, we only get one claim back on that name.

/*
 * tests for CClaimTrie::getClaimById basic consistency checks
 */
BOOST_AUTO_TEST_CASE(get_claim_by_id_test_2)
{
    ClaimTrieChainFixture fixture;
    const std::string name = "test";
    CMutableTransaction tx1 = fixture.MakeClaim(fixture.GetCoinbase(), name, "one", 1);
    uint160 claimId = ClaimIdHash(tx1.GetHash(), 0);
    CMutableTransaction txx = fixture.MakeClaim(fixture.GetCoinbase(), name, "one", 1);
    uint160 claimIdx = ClaimIdHash(txx.GetHash(), 0);

    fixture.IncrementBlocks(1);

    CClaimValue claimValue;
    std::string claimName;
    pclaimTrie->getClaimById(claimId, claimName, claimValue);
    BOOST_CHECK(claimName == name);
    BOOST_CHECK(claimValue.claimId == claimId);

    CMutableTransaction txa = fixture.Spend(tx1);
    CMutableTransaction txb = fixture.Spend(txx);

    fixture.IncrementBlocks(1);
    BOOST_CHECK(!pclaimTrie->getClaimById(claimId, claimName, claimValue));
    BOOST_CHECK(!pclaimTrie->getClaimById(claimIdx, claimName, claimValue));

    fixture.DecrementBlocks(1);
    pclaimTrie->getClaimById(claimId, claimName, claimValue);
    BOOST_CHECK(claimName == name);
    BOOST_CHECK(claimValue.claimId == claimId);

    // this test fails
    pclaimTrie->getClaimById(claimIdx, claimName, claimValue);
    BOOST_CHECK(claimName == name);
    BOOST_CHECK(claimValue.claimId == claimIdx);
}

Also, its a bit weired because claims get added to the claim index when they are queued up (in addClaimToQueues), and also when they are inserted into the claim trie. So the claim gets added twice to the claim index, could that cause a problem?

bvbfan commented 6 years ago

@kaykurokawa i've update my branch with your unit test, it's not affected by this issue, because i'm not writing index direct to levelDB but enqueued it https://github.com/lbryio/lbrycrd/pull/175/commits/2d94bd6a2fd25ce42b9ab4edaac2af939b8a335f#diff-f8129adb47c1b30eac7d2795b17883edR517 and update with empty index https://github.com/lbryio/lbrycrd/pull/175/commits/2d94bd6a2fd25ce42b9ab4edaac2af939b8a335f#diff-f8129adb47c1b30eac7d2795b17883edR523 then check against it https://github.com/lbryio/lbrycrd/pull/175/commits/2d94bd6a2fd25ce42b9ab4edaac2af939b8a335f#diff-f8129adb47c1b30eac7d2795b17883edR529 Tests pass @shyba you can try it on https://github.com/lbryio/lbrycrd/pull/175 or to checkout branch separate_disk_claim_rebased

lyoshenka commented 6 years ago

@lbrynaut @kaykurokawa let me know what you decide once you've synced on this. we can do a release as soon as you're ready

lbrynaut commented 6 years ago

This unit test below (can be added to claimtriebranching_tests.cpp) shows the problem caused by the above line where if we make two claims to the same name, we spend it on the next block, and than decrement that block, we only get one claim back on that name.

@kaykurokawa Thanks, I'll take a look at this tomorrow, although it's unrelated to the bug/issue being fixed.

Without a mechanism to only apply the index state changes on flush, we permanently lose any removed elements from the index on every disconnect/re-org (non-flushed).

Also, its a bit weired because claims get added to the claim index when they are queued up (in addClaimToQueues), and also when they are inserted into the claim trie. So the claim gets added twice to the claim index, could that cause a problem?

I don't understand this. How is it being added twice? It shouldn't be, so maybe I've missed something. Also, there is no issue if that was the case, other than performance/wasted cycles.

lbrynaut commented 6 years ago

I don't understand this. How is it being added twice? It shouldn't be, so maybe I've missed something. Also, there is no issue if that was the case, other than performance/wasted cycles.

@kaykurokawa Ah, I see what you mean and will also look at. I think it could be simplified, but don't think it's incorrect.

lbrynaut commented 6 years ago

This unit test below (can be added to claimtriebranching_tests.cpp) shows the problem caused by the above line where if we make two claims to the same name, we spend it on the next block, and than decrement that block, we only get one claim back on that name.

@kaykurokawa This test appears to pass for me using the changes here. I'll look into further.

lbrynaut commented 6 years ago

@kaykurokawa I added the test as you provided to sanity check against travis, which also appears to pass. I'll probably back it out in case you want to modify or clean-it up a bit though. Am I missing something on this?

kaykurokawa commented 6 years ago

ah sorry, yea that's not the right test ignore that... I think I've figured out what's the problem here

So the issue here I think is that we need to make a proper logic for claim addition and removal. That I believe is the root of the problem, reorg/disconnects should not cause issues as long as we handle this.

So right now there is logic for removing/adding things from the claim index in the function insertIntoClaimTrie() and removeClaimFromTrie().

However, there are cases for this when the logic fails. For removeClaimFromTrie(), you don't want to remove claims from the claim index when a claim is moved from the trie into the claim queue (where claims are kept which have not activated due to activation delay). This happens when undoing a claim activation in a block decrement. We are not handling this properly and I think this is the reason why we are losing claims (see unit test below to reproduce this).

For insertClaimIntoTrie(), you don't want to add claims to the claim index when claims are moved from the queue into the claim trie (because we already add claims to the claim index when they are put into the claim queue). This happens for all claims when they are activated (due to activation delay).

So basically, the function removeClaimFromTrie() and insertClaimIntoTrie() should not be making decisions about whether to add things to the claim index or not. Below, I tried to describe where we should add and remove from the claim index.

When calling ConnectBlock() in main.cpp

spendClaim() for abandon txs
    calls removeClaim() which calls either removeClaimFromQueue() or removeClaimFromTrie()
        Can remove from claim index in either case
addClaim() for new claim txs
    calls addClaimToQueues()
        Can add to claim index
incrementBlock() to make queued changes  (for handling claim activation/expirations)
    calls insertIntoClaimTrie() for activated claims 
        activated claims do not require addition to claim index!
    calls removeFromClaimTrie() for expired claims
        can remove from claim index

When calling DisconnectBlock() in main.cpp:

decrementBlock() to undo queued changes (for handling claim activation/expiration)
    calls insertClaimIntoTrie() to undo expired claims
        can add to claim index
    calls removeClaimFromTrie() to undo activated claims
        undoing activated claims do not require removal from claim index! 
        because they get put back into the claim queue 
undoAddClaim() to undo new claim txs
    calls removeClaim() which calls either removeClaimFromQueue() or removeClaimFromTrie()
        Can remove from claim index in either case
undoSpendClaim() to undo abandon txs
    calls insertClaimIntoTrie() or addClaimToQueues()
        Can add to claim index in either case

Below is the unit test which shows that this PR does not handle properly the case where a claim is merely deactivated in a block decrement but is removed from the claim index.

BOOST_AUTO_TEST_CASE(get_claim_by_id_test_3)
{ 
    ClaimTrieChainFixture fixture;
    pclaimTrie->setExpirationTime(5);
    const std::string name = "test"; 
    CMutableTransaction tx1 = fixture.MakeClaim(fixture.GetCoinbase(), name, "one", 1);
    uint160 claimId = ClaimIdHash(tx1.GetHash(), 0);

    fixture.IncrementBlocks(1);

    CClaimValue claimValue;    
    std::string claimName;
    pclaimTrie->getClaimById(claimId, claimName, claimValue);
    BOOST_CHECK(claimName == name);
    BOOST_CHECK(claimValue.claimId == claimId);
    // make second claim with activation delay 1
    CMutableTransaction tx2 = fixture.MakeClaim(fixture.GetCoinbase(), name, "one", 2);
    uint160 claimId2 = ClaimIdHash(tx2.GetHash(), 0);

    fixture.IncrementBlocks(1);
    // second claim is not activated yet, but can still access by claim id
    BOOST_CHECK(is_best_claim(name,tx1));
    BOOST_CHECK(pclaimTrie->getClaimById(claimId2, claimName, claimValue));
    BOOST_CHECK(claimName == name);
    BOOST_CHECK(claimValue.claimId == claimId2);

    fixture.IncrementBlocks(1);
    // second claim has activated
    BOOST_CHECK(is_best_claim(name,tx2));
    BOOST_CHECK(pclaimTrie->getClaimById(claimId2, claimName, claimValue));
    BOOST_CHECK(claimName == name);
    BOOST_CHECK(claimValue.claimId == claimId2);

    fixture.DecrementBlocks(1);
    // second claim has been deactivated via decrement
    // should still be accesible via claim id
    BOOST_CHECK(is_best_claim(name,tx1));
    BOOST_CHECK(pclaimTrie->getClaimById(claimId2, claimName, claimValue));
    BOOST_CHECK(claimName == name);
    BOOST_CHECK(claimValue.claimId == claimId2);

    fixture.IncrementBlocks(1);
    // second claim should have been re activated via increment
    BOOST_CHECK(is_best_claim(name,tx2));
    BOOST_CHECK(pclaimTrie->getClaimById(claimId2, claimName, claimValue));
    BOOST_CHECK(claimName == name);
    BOOST_CHECK(claimValue.claimId == claimId2);
}
lbrynaut commented 6 years ago

@kaykurokawa Awesome, thanks, will review a bit more today and comment on this further.

lbrynaut commented 6 years ago

@kaykurokawa I agree with the usage above -- nice work! The original claim index usage was broken for more cases than I thought, which that test clearly illustrates via your points on where it fails to do the right thing. However, this still doesn't fully address the issue being fixed here because just following this logic loses claims as well, during say VerifyDB calls where it does 'memory only' disconnects which actually loses real claims (as opposed to a true re-org which does DisconnectTip which includes a flush after disconnect). Maybe my original wording was bad or unclear?

A combination of our two approaches solves it in testing here.

To summarize:

1) I think you have pointed out that the claim index was being used incorrectly, which we agreed on, though you caught more broken cases than me. :+1: This is more correct now ... but ...

2) This PR was trying to address incorrect usage, but mainly in order to fix losing claims (at least that was my initial motivation from speaking with @shyba). To do that, it only applied claim index changes on flush (regardless of the logic that decided those changes).

It's not obvious to find in our current test setup, because we always decrement via Invalidate, which calls DisconnectTip.

This PR is updated with the combination approach I'm proposing (as well as both test cases you've written). If you don't agree or think I've overlooked something else, please let me know.

bvbfan commented 6 years ago

@lbrynaut but can you make insertations inside insertClaimIntoTrie and removeClaimFromTrie otherwise it looks confusing, don't you think?

lbrynaut commented 6 years ago

Hrm, will look at the travis issue later. It might be running slowly, as it hasn't tried to build the latest rebased commit but reports it as failed.

lbrynaut commented 6 years ago

@lbrynaut but can you make insertations inside insertClaimIntoTrie and removeClaimFromTrie otherwise it looks confusing, don't you think?

Please read the above discussion.

lbrynaut commented 6 years ago

It looks like the claim parameter is both an in and an out parameter, but you're not passing a valid one in. It appears that way to me also because the internal removeClaim does a swap, i.e, it expects valid data to be there.

Good catch, I'll look into further. Odd that it works here, probably compiler version dependent.

lbrynaut commented 6 years ago

It looks like the claim parameter is both an in and an out parameter, but you're not passing a valid one in. It appears that way to me also because the internal removeClaim does a swap, i.e, it expects valid data to be there.

Ah, so the reason this works is because pre-C++11 std::swap is copy-constructable and assignable. With C++11, it would use move semantics (move-constructible and move-assignable) and you would be correct to see what you're seeing.

So good catch, but no immediate fix necessary. Agreed?

BrannonKing commented 6 years ago

Yes, I think it is safe. In looking at it again, the swap in CClaimTrieNode::removeClaim doesn't actually need to be there as the claim is erased right after the swap. We could just as easily do claim = *itClaims; and avoid the confusion.

lbrynaut commented 6 years ago

Yes, I think it is safe. In looking at it again, the swap in CClaimTrieNode::removeClaim doesn't actually need to be there as the claim is erased right after the swap. We could just as easily do claim = *itClaims; and avoid the confusion.

Right, that's exactly what it does in place of the swap. There are several cases that need updating, not just that one. I'll make sure to update them in the upstream merge if I haven't already.

bvbfan commented 6 years ago

The source of problem are activated claims, as @kaykurokawa pointed out, they bother API when it's performed Increment / Decrement block and it can lie to you, i mean: IncrementBlock

DecrementBlock is on opposite, this looks easy to misread. Should we make API more suggestive?

lbrynaut commented 6 years ago

Should we make API more suggestive?

Over time, probably. Re-writing the claimtrie interface is not related to this PR.

kaykurokawa commented 6 years ago

Untested but LGTM, yea, API can def be improved let's keep this in mind or start new issue. I'd like to just clean up the tests (improve comments), but I can do that in a separate PR later.

lbrynaut commented 6 years ago

Will merge pending testing from @shyba

shyba commented 6 years ago

Tested locally by adding the commit that verifies integrity as a whole and did some tests like restarting, worked. Removed the helper commit, deployed. So far so good! LGTM