Closed Aprogiena closed 6 years ago
Indeed I agree, my intention though is not to fix NBitcoin but to move over components form there to the fullnode and change them to better fit the fullnode requirements, trying to keep NBitcoin with little changes as possible to make it easy to merge from the upstream master Nicolas is maintaining.
@bokobza did implement a search method O(Log N) I believe in the wallet, in any we definitely can improve on that.
Of course, that is also an option. The point was to get this fixed, no matter how :-)
As for the better implementation in the wallet, if it is the one that relies on GetLocator, then it is better than without it, but the initialization is still O(N).
Just sharing here the place where bitcoin does O(log N) https://github.com/bitcoin/bitcoin/blob/master/src/chain.cpp#L83
yes, that's the core function, which we have implemented naively, but note that then they use this function everywhere, but we don't, we sometimes do block = block.previous
loop, so there are different methods that reimplement this in a naive way. so what we want is first fix that ancestor method and then use it everywhere
So this issue will maybe change to become the task to move the ConcurrentChain from NBitcoin to the FN I would suggest to call it ChainIndex (so we dont have collisions with the other ConcurrentChain).
Well the question is whether we shouldn't really implement that session based system. I would call it HeaderChain or BlockHeaderChain.
Then we could slowly replace use of ConcurrentChain with this new thing.
To confirm the task here is to replace ChainedBlock.GetAncestor() with bitcoin core algorithm using skip list and then update places searching for ancestor to use it?
Yes this is part of this task, can you explain here how core do it in more detail first?
Basically for each item in their linked list they also store a secondary pointer to the "skip item" in addition to the pointer to the previous item. The skip pointer is then used for searching the list faster by using the skip list algorithm. We should be able to do something similar but will involve modifying ChainedBlock to maintain this skip item in each node of the list.
@dangershony @Aprogiena Trying to figure out where is best location to put the call to compute the skip pointer for the block. In core they seem to do it explicitly for the block when calls to AddToBlockIndex() and LoadBlockIndexDB() in validation.cpp. I'm wondering whether we might be able to just do it in the ChainedBlock.Previous setter. This would be cleaner but do you guys see any flaws in this strategy?
i'm not a big fan of custom setters, i haven't checked but is Previous actually set anywhere else than the constructor? if not, i'd go with the constructor
but even if you go with setter, that's fine, this is just personal preference, not objectively better suggestion
I've updated both ChainedBlock.GetAncestor and ChainedBlock.GetLocator to use a skip list. Performance improvement is significant.
In ChainedBlock.FindFork we should be able to replace:
while (highChain.Height != lowChain.Height)
{
highChain = highChain.Previous;
}
With highChain.GetAncestor(lowChain.Height);
which uses skipList
For FindAncestorOrSelf I don't think skipList will improve anything as it's searching the list for a hash unless somehow we can get the height of the block from its hash.
yes we should find the height of the block using concurrent chain, is it somewhere impossible?
@Aprogiena ok sounds good, I haven't looked at concurrent chain at all yet. Will investigate it and see if there are some opportunities there as well.
concurrent chain has an ability to give you chained block by its hash
I propose we get rid of ChainBase.Contains(ChainedBlock blockIndex) altogether. It's not referenced anywhere and if you really needed to do this its quite easy and more clear by just calling GetBlock(blockIndex.Height) != null
ah ok, yes, that one is very dangerous actually, you should rarely want that i think the correct implementation of that is GetBlock(chainedBlock.hash) not its height so this is why i removed from code base all calls to this method
but tbh any call to either Contains is unsafe ... but that's again the discussion on sessions ... that' probably offtopic here
Should we be moving ConcurrentChain, ChainedBlock, and ChainBase to Stratis.Bitcoin? If so what should they be called and where should they be moved to.
A few discussion points:
There is also the whole discussion around this whole architecture and thread safety. I believe Apro has some concerns with this. Is moving the existing classes worthwhile or should we be refactoring to a new architecture as we move them?
i don't think such a move would be very easy, i was interested in implementing a parallel implementation that is better and actually secure to use, i still think that is the better way to do it
but if i discount that and actually answer your points, my view would be:
.Chain
sounds better to meyes, i do have concerns :) i don't think it is worth to try to import it, there will be too many dependencies in NBitcoin imho
Ok, yeah I think those BlockValidator/BlockStore dependencies are a big problem with moving as is, so I agree with you. For now I will do a high level cleanup of seeing if there are any instances of the chain enumerations that can be replaced with GetAncestor() so they take advantage of the skip list and leave it at that for this issue.
I am leaving this issue open and bumping in priority so we can try to get it done asap.
what is the status of this issue?
Digging into the code i spotted some places that could be improved, like the methods of the ConcurrentChain
class like the SetTipLocked method, but since that class will be abandoned would be a wasted effort.
Talking about the ChainedHeader
class, I'd like to point out
that could be rewritten as
int heightSkipPrev = walk.Previous.Skip.Height;
that could be both faster (?) and more readable (and maybe less error prone in case the GetSkipHeight change behaviour in the future, based on the context/content)
I think the FindFork method should be commented a bit to explain how it works and some more comment about custom (bitcoin) SkipList implementation would be nice
should investigate if this method could be removed or refactored in order to use ChainedHeader as parameter, so could use GetAncestor internally (doesn't use Skip List feature)
actually it's only used here https://github.com/stratisproject/StratisBitcoinFullNode/blob/c6de1f529ca4461f76c39dcee11f602c2de0120f/src/Stratis.Bitcoin/Consensus/ConsensusManager.cs#L167-L173
Not sure if an easy solution could be using
this.chain.GetBlock(consensusTipHash)
and pass its result to FindAncestorOrSelf in order to use the other optimized overload and remove the other one
I need to navigate backward to understand if this.chain contains that hash
Last thing to annote regarding ChainedHeader is about the method EnumerateToGenesis
Actually it can be misused and it's currently implemented just in ConcurrentChain (so no problem), in ChainTest (could be removed) and in ChainBase public IEnumerable<ChainedHeader> ToEnumerable(bool fromTip)
that ToEnumerable method is used in test, so I'd suggest to mark it as private but expose it for tests (if needed) or remove it tout court and use something else in tests
Good idea about optimizing chained header, what you've wrote sounds correct. As about ConcurrentChain- also good ideas but we will remove this component so no need to invest time in it.
is walk.Height - 1
is always the same as walk.Previous.Skip.Height
?
Good idea to add comments on FindFork()
@Aprogiena will be happier 😉
FindAncestorOrSelf
currently called initialise so not no deal if not optimised but it will be called later when we need to find ancestor in a max reorg scenario (we may have the ChainedHeader then if yes we can write a new version of this method.
is
walk.Height - 1
is always the same aswalk.Previous.Skip.Height
?
yes because the Skip property is set by the constructor
this.Skip = this.Previous.GetAncestor(this.GetSkipHeight(this.Height));
so the skip height this.GetSkipHeight(walk.Height - 1)
is like calling this.GetSkipHeight(walk.Height)
on the previous element height
I'm working on a PR to improve some of the aspect i commented above. Googling i found the FindFork implementation of bitcoin
and probably our can be improved following that implementation, I'll try that.
Uhm... I'm a bit confused...
I see that bitcoin doesn't have a FindFork into the CBlockIndex
class (that reminds me our ChainedHeader class) but have it in a CChain class... so why did we have this method misplaced and what's our class that's behave like CChain
?
I see original NBitcoin code has it too.
I think we should try to avoid this method and try using ChainBase
for that
But ChainBase actually has a FindFork that works on a subset of hash enumerable, so we'd need to create a new FindFork method that accept a ChainedBlock as parameter (so actually means moving the FindFork from ChainedHeader to ChainBase, so we can then use the internal list/dictionary like ConcurrentChain
does
But this lead to the need of refactoring, moving the
private readonly Dictionary<uint256, ChainedHeader> blocksById = new Dictionary<uint256, ChainedHeader>();
private readonly Dictionary<int, ChainedHeader> blocksByHeight = new Dictionary<int, ChainedHeader>();
from the ConcurrentChain
to the ChainBase
changing private to protected or using accessors, this way we can optimize the ChainBase.FindFork to use Dictionary to jump to desidered height
at the same time this code in ConcurrentChain
#region IChain Members
public override ChainedHeader GetBlock(uint256 id)
{
using (this.lockObject.LockRead())
{
ChainedHeader result;
this.blocksById.TryGetValue(id, out result);
return result;
}
}
private ChainedHeader GetBlockLocked(int height)
{
ChainedHeader result;
this.blocksByHeight.TryGetValue(height, out result);
return result;
}
public override ChainedHeader GetBlock(int height)
{
using (this.lockObject.LockRead())
{
return GetBlockLocked(height);
}
}
#endregion
back to ChainBase
.
At this point I think all pieces can match and we can have some performance improvements and less different FindFork methods around
Well, I could even be wrong because of missing informations, so I'm waiting to hear from you.
Note: I know ConcurrentChain is going to be abandoned, but will ChainBase be abandoned too? If I've understood, ChainedHeaderTree will take the place of ConcurrentChain? I see it lack a dictionary of blocks by height, this way the FindFork can be implemented in that class
Yes also ChainBase will go away, or perhaps we will use that component as a holder of blocks that are bellow consensus tip, but we have ChainHeaderTree for that and if we need we can always just add another dictionary to represent height.
New issue created to track only ChainedHeader
.
There are couple of wrongly implemented functions that we should aim to fix in NBitcoin.
1) ChainBase.Contains(ChainedBlock blockIndex)
This method is implemented as follows:
it is thus terribly misleading to write something like
this.chain.Contains(myBlock)
because you would expect to get true if your block is on the best chain. Instead, you receive true if the best chain contains a block at the height of your block.2) ChainedBlock.GetAncestor This is implemented as
which is O(N) time complexity function. Bitcoin Core implements skip list to make it O(log N), which is the correct way to do it.
Similarly,
ChainedBlock.FindFork
,ChainedBlock.FindAncestorOrSelf
,ChainBase.FindFork(ChainBase chain)
suffer from the same problem. And all the code that is using some kind ofthis.Chain.Enumerate*
should be reviewed as well. It would be probably better to complete remove all those Enumerate methods.3) ChainedBlock.GetLocator
Similarly as in 2), this function is O(N), while in Bitcoin Core it is O(log N).