stratisproject / StratisBitcoinFullNode

Bitcoin full node in C#
https://stratisplatform.com
MIT License
786 stars 314 forks source link

[NBitcoin] Fix problematic chain functions #659

Closed Aprogiena closed 6 years ago

Aprogiena commented 6 years ago

There are couple of wrongly implemented functions that we should aim to fix in NBitcoin.

1) ChainBase.Contains(ChainedBlock blockIndex)

This method is implemented as follows:

        public bool Contains(ChainedBlock blockIndex)
        {
            if(blockIndex == null)
                throw new ArgumentNullException("blockIndex");
            return GetBlock(blockIndex.Height) != null;
        }

it is thus terribly misleading to write something like this.chain.Contains(myBlock) because you would expect to get true if your block is on the best chain. Instead, you receive true if the best chain contains a block at the height of your block.

2) ChainedBlock.GetAncestor This is implemented as

        public ChainedBlock GetAncestor(int height)
        {
            if(height > Height || height < 0)
                return null;
            ChainedBlock current = this;

            while(true)
            {
                if(current.Height == height)
                    return current;
                current = current.Previous;
            }
        }

which is O(N) time complexity function. Bitcoin Core implements skip list to make it O(log N), which is the correct way to do it.

Similarly, ChainedBlock.FindFork, ChainedBlock.FindAncestorOrSelf, ChainBase.FindFork(ChainBase chain) suffer from the same problem. And all the code that is using some kind of this.Chain.Enumerate* should be reviewed as well. It would be probably better to complete remove all those Enumerate methods.

3) ChainedBlock.GetLocator

Similarly as in 2), this function is O(N), while in Bitcoin Core it is O(log N).

dangershony commented 6 years ago

Indeed I agree, my intention though is not to fix NBitcoin but to move over components form there to the fullnode and change them to better fit the fullnode requirements, trying to keep NBitcoin with little changes as possible to make it easy to merge from the upstream master Nicolas is maintaining.

@bokobza did implement a search method O(Log N) I believe in the wallet, in any we definitely can improve on that.

Aprogiena commented 6 years ago

Of course, that is also an option. The point was to get this fixed, no matter how :-)

As for the better implementation in the wallet, if it is the one that relies on GetLocator, then it is better than without it, but the initialization is still O(N).

dangershony commented 6 years ago

Just sharing here the place where bitcoin does O(log N) https://github.com/bitcoin/bitcoin/blob/master/src/chain.cpp#L83

Aprogiena commented 6 years ago

yes, that's the core function, which we have implemented naively, but note that then they use this function everywhere, but we don't, we sometimes do block = block.previous loop, so there are different methods that reimplement this in a naive way. so what we want is first fix that ancestor method and then use it everywhere

dangershony commented 6 years ago

So this issue will maybe change to become the task to move the ConcurrentChain from NBitcoin to the FN I would suggest to call it ChainIndex (so we dont have collisions with the other ConcurrentChain).

Aprogiena commented 6 years ago

Well the question is whether we shouldn't really implement that session based system. I would call it HeaderChain or BlockHeaderChain.

Then we could slowly replace use of ConcurrentChain with this new thing.

mikedennis commented 6 years ago

Implement skiplist for concurrent chain

mikedennis commented 6 years ago

To confirm the task here is to replace ChainedBlock.GetAncestor() with bitcoin core algorithm using skip list and then update places searching for ancestor to use it?

dangershony commented 6 years ago

Yes this is part of this task, can you explain here how core do it in more detail first?

mikedennis commented 6 years ago

Basically for each item in their linked list they also store a secondary pointer to the "skip item" in addition to the pointer to the previous item. The skip pointer is then used for searching the list faster by using the skip list algorithm. We should be able to do something similar but will involve modifying ChainedBlock to maintain this skip item in each node of the list.

mikedennis commented 6 years ago

@dangershony @Aprogiena Trying to figure out where is best location to put the call to compute the skip pointer for the block. In core they seem to do it explicitly for the block when calls to AddToBlockIndex() and LoadBlockIndexDB() in validation.cpp. I'm wondering whether we might be able to just do it in the ChainedBlock.Previous setter. This would be cleaner but do you guys see any flaws in this strategy?

Aprogiena commented 6 years ago

i'm not a big fan of custom setters, i haven't checked but is Previous actually set anywhere else than the constructor? if not, i'd go with the constructor

but even if you go with setter, that's fine, this is just personal preference, not objectively better suggestion

mikedennis commented 6 years ago

I've updated both ChainedBlock.GetAncestor and ChainedBlock.GetLocator to use a skip list. Performance improvement is significant.

In ChainedBlock.FindFork we should be able to replace:

while (highChain.Height != lowChain.Height)
{
    highChain = highChain.Previous;
}

With highChain.GetAncestor(lowChain.Height); which uses skipList

For FindAncestorOrSelf I don't think skipList will improve anything as it's searching the list for a hash unless somehow we can get the height of the block from its hash.

Aprogiena commented 6 years ago

yes we should find the height of the block using concurrent chain, is it somewhere impossible?

mikedennis commented 6 years ago

@Aprogiena ok sounds good, I haven't looked at concurrent chain at all yet. Will investigate it and see if there are some opportunities there as well.

Aprogiena commented 6 years ago

concurrent chain has an ability to give you chained block by its hash

mikedennis commented 6 years ago

I propose we get rid of ChainBase.Contains(ChainedBlock blockIndex) altogether. It's not referenced anywhere and if you really needed to do this its quite easy and more clear by just calling GetBlock(blockIndex.Height) != null

Aprogiena commented 6 years ago

ah ok, yes, that one is very dangerous actually, you should rarely want that i think the correct implementation of that is GetBlock(chainedBlock.hash) not its height so this is why i removed from code base all calls to this method

but tbh any call to either Contains is unsafe ... but that's again the discussion on sessions ... that' probably offtopic here

mikedennis commented 6 years ago

Should we be moving ConcurrentChain, ChainedBlock, and ChainBase to Stratis.Bitcoin? If so what should they be called and where should they be moved to.

A few discussion points:

  1. Seems like Stratis.Bitcoin.Base currently has some Chain related functionality. Good landing spot for this code? Or maybe Stratis.Bitcoin.Chain?
  2. We seem to agree ChainedBlock should be renamed to ChainedBlockHeader
  3. Should ChainBase be an interface (ie IConcurrentChain) instead of a base class and its functionality moved into ConcurrentChain? I see only one derived type - ConcurrentChain.
  4. ChainBase is consumed by NBitcoin.BlockValidator and NBitcoin.BlockStore - does that mean we will have to move them over as well? Any issues with that or do we already have our own copies of those?
  5. ConcurrentChain is also referenced by BlockValidator and BlockStore. What should we call this? ConcurrentChainedBlockHeaders?
  6. Tests in ChainTests should be split into ConcurrentChainTests (or whatever we call this) and ChainedBlockTests (or whatever we call it).

There is also the whole discussion around this whole architecture and thread safety. I believe Apro has some concerns with this. Is moving the existing classes worthwhile or should we be refactoring to a new architecture as we move them?

Aprogiena commented 6 years ago

i don't think such a move would be very easy, i was interested in implementing a parallel implementation that is better and actually secure to use, i still think that is the better way to do it

but if i discount that and actually answer your points, my view would be:

  1. .Chain sounds better to me
  2. yes
  3. yes, remove ChainBase, replace with interface
  4. no, we don't want those guys, that's one of the reasons i would not import it at all but create from scratch
  5. i would certainly remove the word Concurrent from the name as it means "Evil" ;)
  6. neutral

yes, i do have concerns :) i don't think it is worth to try to import it, there will be too many dependencies in NBitcoin imho

mikedennis commented 6 years ago

Ok, yeah I think those BlockValidator/BlockStore dependencies are a big problem with moving as is, so I agree with you. For now I will do a high level cleanup of seeing if there are any instances of the chain enumerations that can be replaced with GetAncestor() so they take advantage of the skip list and leave it at that for this issue.

dangershony commented 6 years ago

I am leaving this issue open and bumping in priority so we can try to get it done asap.

noescape00 commented 6 years ago

what is the status of this issue?

MithrilMan commented 6 years ago

Digging into the code i spotted some places that could be improved, like the methods of the ConcurrentChain class like the SetTipLocked method, but since that class will be abandoned would be a wasted effort.

Talking about the ChainedHeader class, I'd like to point out

https://github.com/stratisproject/StratisBitcoinFullNode/blob/4a12434982a6e64e5883fce366c39f5c918293da/src/NBitcoin/ChainedHeader.cs#L582

that could be rewritten as

int heightSkipPrev = walk.Previous.Skip.Height;

that could be both faster (?) and more readable (and maybe less error prone in case the GetSkipHeight change behaviour in the future, based on the context/content)

I think the FindFork method should be commented a bit to explain how it works and some more comment about custom (bitcoin) SkipList implementation would be nice

MithrilMan commented 6 years ago

https://github.com/stratisproject/StratisBitcoinFullNode/blob/4a12434982a6e64e5883fce366c39f5c918293da/src/NBitcoin/ChainedHeader.cs#L295-L307

should investigate if this method could be removed or refactored in order to use ChainedHeader as parameter, so could use GetAncestor internally (doesn't use Skip List feature)

actually it's only used here https://github.com/stratisproject/StratisBitcoinFullNode/blob/c6de1f529ca4461f76c39dcee11f602c2de0120f/src/Stratis.Bitcoin/Consensus/ConsensusManager.cs#L167-L173

Not sure if an easy solution could be using this.chain.GetBlock(consensusTipHash) and pass its result to FindAncestorOrSelf in order to use the other optimized overload and remove the other one

I need to navigate backward to understand if this.chain contains that hash

MithrilMan commented 6 years ago

Last thing to annote regarding ChainedHeader is about the method EnumerateToGenesis Actually it can be misused and it's currently implemented just in ConcurrentChain (so no problem), in ChainTest (could be removed) and in ChainBase public IEnumerable<ChainedHeader> ToEnumerable(bool fromTip)

that ToEnumerable method is used in test, so I'd suggest to mark it as private but expose it for tests (if needed) or remove it tout court and use something else in tests

noescape00 commented 6 years ago

Good idea about optimizing chained header, what you've wrote sounds correct. As about ConcurrentChain- also good ideas but we will remove this component so no need to invest time in it.

dangershony commented 6 years ago

is walk.Height - 1 is always the same as walk.Previous.Skip.Height?

Good idea to add comments on FindFork() @Aprogiena will be happier 😉

FindAncestorOrSelf currently called initialise so not no deal if not optimised but it will be called later when we need to find ancestor in a max reorg scenario (we may have the ChainedHeader then if yes we can write a new version of this method.

MithrilMan commented 6 years ago

is walk.Height - 1 is always the same as walk.Previous.Skip.Height?

yes because the Skip property is set by the constructor this.Skip = this.Previous.GetAncestor(this.GetSkipHeight(this.Height));

so the skip height this.GetSkipHeight(walk.Height - 1) is like calling this.GetSkipHeight(walk.Height) on the previous element height

MithrilMan commented 6 years ago

I'm working on a PR to improve some of the aspect i commented above. Googling i found the FindFork implementation of bitcoin

https://github.com/bitcoin/bitcoin/blob/cc7258bdfb44c5b5f3498296d8c9e6791655e89f/src/chain.cpp#L51-L60

and probably our can be improved following that implementation, I'll try that.

MithrilMan commented 6 years ago

Uhm... I'm a bit confused... I see that bitcoin doesn't have a FindFork into the CBlockIndex class (that reminds me our ChainedHeader class) but have it in a CChain class... so why did we have this method misplaced and what's our class that's behave like CChain?

I see original NBitcoin code has it too.

I think we should try to avoid this method and try using ChainBase for that But ChainBase actually has a FindFork that works on a subset of hash enumerable, so we'd need to create a new FindFork method that accept a ChainedBlock as parameter (so actually means moving the FindFork from ChainedHeader to ChainBase, so we can then use the internal list/dictionary like ConcurrentChain does

But this lead to the need of refactoring, moving the

private readonly Dictionary<uint256, ChainedHeader> blocksById = new Dictionary<uint256, ChainedHeader>();
private readonly Dictionary<int, ChainedHeader> blocksByHeight = new Dictionary<int, ChainedHeader>();

from the ConcurrentChain to the ChainBase changing private to protected or using accessors, this way we can optimize the ChainBase.FindFork to use Dictionary to jump to desidered height

at the same time this code in ConcurrentChain

#region IChain Members

public override ChainedHeader GetBlock(uint256 id)
{
    using (this.lockObject.LockRead())
    {
        ChainedHeader result;
        this.blocksById.TryGetValue(id, out result);
        return result;
    }
}

private ChainedHeader GetBlockLocked(int height)
{
    ChainedHeader result;
    this.blocksByHeight.TryGetValue(height, out result);
    return result;
}

public override ChainedHeader GetBlock(int height)
{
    using (this.lockObject.LockRead())
    {
        return GetBlockLocked(height);
    }
}

#endregion

back to ChainBase.

At this point I think all pieces can match and we can have some performance improvements and less different FindFork methods around

Well, I could even be wrong because of missing informations, so I'm waiting to hear from you.

Note: I know ConcurrentChain is going to be abandoned, but will ChainBase be abandoned too? If I've understood, ChainedHeaderTree will take the place of ConcurrentChain? I see it lack a dictionary of blocks by height, this way the FindFork can be implemented in that class

dangershony commented 6 years ago

Yes also ChainBase will go away, or perhaps we will use that component as a holder of blocks that are bellow consensus tip, but we have ChainHeaderTree for that and if we need we can always just add another dictionary to represent height.

dangershony commented 6 years ago

New issue created to track only ChainedHeader.