Closed noescape00 closed 6 years ago
Hi noescape00,
I have included Trace logging within Stratis.Bitcoin.Features.Consensus.ConsensusLoop. Against MainNet. On my VM FullNode test server.
I'll keep StratisD running and check the log on a daily basis.
If we are able to recreate the issue we can analyse the verbose log information. Which will put us in a better position to fix forward.
Cheers,
Ferdeen
I'll keep StratisD running and check the log on a daily basis.
No need to check the log, just take a look at the console output once a day. What you are looking for is: headers height is ahead of consensus by some blocks, consensus is ahead of block store by 1 block, consensus and block store don't advance.
I have included Trace logging within Stratis.Bitcoin.Features.Consensus.ConsensusLoop. Against MainNet. On my VM FullNode test server. If we are able to recreate the issue we can analyse the verbose log information. Which will put us in a better position to fix forward.
Perfect! gj
Ok cool. Thanks.
@noescape00 I think the fix we introduced CachedConsensusTip
is not such a good idea it makes Store dependent on blocks arriving from Consensus I am rethinking that approach.
But in any case store would trigger the reorg only once consensus.tip it goes beyond the store.tip so the fact store is not reorging is fine.
The question is why did consensus stop, this is an issue we need to address so promoting this to high priority.
As signaling is still blocking consensus I wonder if any of the other components (like wallet) might be having an issue and blocking consensus form continuing
@noescape00 I think the fix we introduced CachedConsensusTip is not such a good idea it makes Store dependent on blocks arriving from Consensus I am rethinking that approach.
Yeah, it's not. It was meant to be a temporary hack in order to solve a race condition problem without redesigning anything. Another solution was introduced in the other PR- I used an asyncqueue there to implement a trigger-based blocks handling instead of checking for new blocks in an endless loop. This solution is also covered in the block store refactoring issue but I need people to agree on it before I can proceed.
Also there is nothing wrong with waiting for a new block to arrive before executing logic in the block store loop.
However I don't think that this is causing a bug since block store should not affect consensus advancement.
The question is why did consensus stop, this is an issue we need to address so promoting this to high priority.
Agreed. I've setted up logging for the consensus loop this morning and left the node running on a server. @ferdeen also has the logging on so in case any of us catch this bug again we will get a good idea why it's happening.
Caught it. Looks like consensus loop logs are insufficient. Last lines from consensus:
[2018-03-11 01:20:40.0129 37] TRACE: Stratis.Bitcoin.Features.Consensus.ConsensusLoop+<PullerLoopAsync>d__38.MoveNext Asking block puller to deliver next block.
[2018-03-11 01:20:53.5236 37] TRACE: Stratis.Bitcoin.Features.Consensus.ConsensusLoop+<PullerLoopAsync>d__38.MoveNext No block received from puller due to reorganization.
[2018-03-11 01:20:53.5236 37] TRACE: Stratis.Bitcoin.Features.Consensus.ConsensusLoop+<PullerLoopAsync>d__38.MoveNext Rewinding.
[2018-03-11 01:20:53.5236 37] TRACE: Stratis.Bitcoin.Features.Consensus.ConsensusLoop.RewindCoinViewLockedAsync ()
[2018-03-11 01:20:53.8148 37] INFO: Stratis.Bitcoin.Features.Consensus.ConsensusLoop+<RewindCoinViewLockedAsync>d__39.MoveNext Reorg detected, rewinding from '329619-910b582498fa8729724d965d2c9805e08ccde9dfa1e49df68d96aee71c8dfafe' to '329618-9f8937cfc8f272cd6d741c407c5811b6331f17c75ed25cf97aab5dfb5a608241'.
[2018-03-11 01:20:53.8148 37] TRACE: Stratis.Bitcoin.Features.Consensus.ConsensusLoop+<RewindCoinViewLockedAsync>d__39.MoveNext (-)
[2018-03-11 01:20:53.8148 37] TRACE: Stratis.Bitcoin.Features.Consensus.ConsensusLoop+<PullerLoopAsync>d__38.MoveNext Asking block puller to deliver next block.
Full logs:
I'll add logging for the block pullers and see what they will tell. So far it looks like block puller never asked peers to deliver the block consensus is waiting for.
Cool. I'll check the logs on my VM. Also I'll add logging for block puller tommorow.
OK guys, so an update on this.
Last night my consensus tip stopped advancing again. Having looked at the code, I see that the only way the BlockStoreLoop
can advance is if the CachedConsensusTip
is higher than the StoreTip
.
The StoreTip
can only be set if we add a block to PendingStorage
and this happens via BlockStoreSignaled.OnNextCore()
which in my case happened up until Block 335619 and then stopped. So in other words the BlockStoreLoop cannot advance but the we keep on downloading block headers.
All of this occurred around 22:02 last night, then straight after at 22:03 ConsensLoop
inexplicable STOPPED:
[2018-03-15 22:03:09.1077 22] INFO: Stratis.Bitcoin.Utilities.AsyncLoop+<>c__DisplayClass16_0+<<StartAsync>b__0>d.MoveNext Consensus Loop stopping
[2018-03-15 22:03:09.7865 22] FATAL: Stratis.Bitcoin.Utilities.AsyncLoop+<>c__DisplayClass16_0+<<StartAsync>b__0>d.MoveNext Consensus Loop threw an unhandled exception
[2018-03-15 22:03:09.7865 22] ERROR: Stratis.Bitcoin.Utilities.AsyncLoop+<>c__DisplayClass16_0+<<StartAsync>b__0>d.MoveNext Consensus Loop threw an unhandled exception: System.Threading.LockRecursionException: Recursive read lock acquisitions not allowed in this mode.
at System.Threading.ReaderWriterLockSlim.TryEnterReadLockCore(TimeoutTracker timeout)
at System.Threading.ReaderWriterLockSlim.EnterReadLock()
at NBitcoin.ReaderWriterLock.<LockRead>b__1_0() in C:\Users\fullnodefr\Documents\GitHub\fassadlr\StratisBitcoinFullNode\src\NBitcoin\Utils\ReaderWriterLock.cs:line 23
at NBitcoin.ActionDisposable..ctor(Action onEnter, Action onLeave) in C:\Users\fullnodefr\Documents\GitHub\fassadlr\StratisBitcoinFullNode\src\NBitcoin\Utils\ActionDisposable.cs:line 12
at NBitcoin.ReaderWriterLock.LockRead() in C:\Users\fullnodefr\Documents\GitHub\fassadlr\StratisBitcoinFullNode\src\NBitcoin\Utils\ReaderWriterLock.cs:line 23
at Stratis.Bitcoin.Features.Consensus.CoinViews.CachedCoinView.<FetchCoinsAsync>d__30.MoveNext() in C:\Users\fullnodefr\Documents\GitHub\fassadlr\StratisBitcoinFullNode\src\Stratis.Bitcoin.Features.Consensus\CoinViews\CachedCoinView.cs:line 179
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at System.Runtime.CompilerServices.ConfiguredTaskAwaitable`1.ConfiguredTaskAwaiter.GetResult()
at Stratis.Bitcoin.Features.Consensus.ConsensusLoop.<ValidateAndExecuteBlockAsync>d__42.MoveNext() in C:\Users\fullnodefr\Documents\GitHub\fassadlr\StratisBitcoinFullNode\src\Stratis.Bitcoin.Features.Consensus\ConsensusLoop.cs:line 475
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at Stratis.Bitcoin.Features.Consensus.ConsensusLoop.<AcceptBlockAsync>d__40.MoveNext() in C:\Users\fullnodefr\Documents\GitHub\fassadlr\StratisBitcoinFullNode\src\Stratis.Bitcoin.Features.Consensus\ConsensusLoop.cs:line 358
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at Stratis.Bitcoin.Features.Consensus.ConsensusLoop.<PullerLoopAsync>d__38.MoveNext() in C:\Users\fullnodefr\Documents\GitHub\fassadlr\StratisBitcoinFullNode\src\Stratis.Bitcoin.Features.Consensus\ConsensusLoop.cs:line 304
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at Stratis.Bitcoin.Features.Consensus.ConsensusLoop.<<StartAsync>b__36_0>d.MoveNext() in C:\Users\fullnodefr\Documents\GitHub\fassadlr\StratisBitcoinFullNode\src\Stratis.Bitcoin.Features.Consensus\ConsensusLoop.cs:line 238
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
Which is obviously what consensus isn't advancing.
At that time, it was fetching coins:
[2018-03-15 22:03:09.7865 22] TRACE: Stratis.Bitcoin.Features.Consensus.CoinViews.CachedCoinView.FetchCoinsAsync Cache missed for transaction ID '5c103d475578d2b8847201e505b6d232e61b7c1300a1ed4e56337618dab8df5b'.
[2018-03-15 22:03:09.7865 22] TRACE: Stratis.Bitcoin.Features.Consensus.CoinViews.CachedCoinView.FetchCoinsAsync Cache missed for transaction ID 'e7e2301a341d60a1a0979b01cf00b769f8f816a07fd52ba195a278eaa101395c'.
[2018-03-15 22:03:09.7865 22] TRACE: Stratis.Bitcoin.Features.Consensus.CoinViews.CachedCoinView.FetchCoinsAsync Cache missed for transaction ID '6e610e690a860a469e22d7ad8cca6fc229b4145476794d8f1ccb6e5dd49414e5'.
[2018-03-15 22:03:09.7865 22] TRACE: Stratis.Bitcoin.Features.Consensus.CoinViews.CachedCoinView.FetchCoinsAsync Cache missed for transaction ID '5fa5f8f6585a9b529e0b00cafb099c2ebccd938a628ee3f8b526ac8b567fb4ff'.
[2018-03-15 22:03:09.7865 22] TRACE: Stratis.Bitcoin.Features.Consensus.CoinViews.CachedCoinView.FetchCoinsAsync Cache missed for transaction ID '76543aee99f24d4b6dd085b78384d5141595a976c36b8d2173b5f33bb1db3dfb'.
[2018-03-15 22:03:09.7865 22] TRACE: Stratis.Bitcoin.Features.Consensus.CoinViews.CachedCoinView.FetchCoinsAsync Cache missed for transaction ID '34355e88d66a6b199ea68f0cb878a479792015eac92fb989fb2f3fab85f5a09c'.
[2018-03-15 22:03:09.7865 22] TRACE: Stratis.Bitcoin.Features.Consensus.CoinViews.CachedCoinView.FetchCoinsAsync Cache missed for transaction ID '8a802541519114eff4d25b948363196e57d872bd045506ad8ecaebd04527ebfc'.
[2018-03-15 22:03:09.7865 22] TRACE: Stratis.Bitcoin.Features.Consensus.CoinViews.CachedCoinView.FetchCoinsAsync Cache missed for transaction ID '1239821b6d7817268aa2fee48dff5bf2a5ba84401a755d94f22ce035a6fcf0ca'.
[2018-03-15 22:03:09.7865 22] TRACE: Stratis.Bitcoin.Features.Consensus.CoinViews.CachedCoinView.FetchCoinsAsync Cache missed for transaction ID 'b97209d60bd0e09d1679dfc5f399ca0ad3292a6d980fbe931c4cf7bb5e442843'.
[2018-03-15 22:03:09.7865 22] TRACE: Stratis.Bitcoin.Features.Consensus.CoinViews.CachedCoinView.FetchCoinsAsync Cache missed for transaction ID '8b99b818a77440d135cc59ed8eba1af2f0eaf040917408b34cd4799edbec3e1d'.
[2018-03-15 22:03:09.7865 22] TRACE: Stratis.Bitcoin.Features.Consensus.CoinViews.CachedCoinView.FetchCoinsAsync Cache missed for transaction ID '776684e968b6a781b5ced6a1a6a16ae201defc2b809175e1bc445085525974ce'.
[2018-03-15 22:03:09.7865 22] TRACE: Stratis.Bitcoin.Features.Consensus.CoinViews.CachedCoinView.FetchCoinsAsync Cache missed for transaction ID 'a2826d8330a958a12cdcd2df11b8f3cb4660a8641a2ca716d62a560c125acbad'.
[2018-03-15 22:03:09.7865 22] TRACE: Stratis.Bitcoin.Features.Consensus.CoinViews.CachedCoinView.FetchCoinsAsync Cache missed for transaction ID '809d4aa02ce43e0b75b7f5e234a4beec4dcfa0436dd1c83f569f1aad6cdf46cf'.
[2018-03-15 22:03:09.7865 22] TRACE: Stratis.Bitcoin.Features.Consensus.CoinViews.CachedCoinView.FetchCoinsAsync Cache missed for transaction ID 'c2b73b04738f81e416ae020d0db8d1da44bd491d13967124c51e72920443a66a'.
[2018-03-15 22:03:09.7865 22] TRACE: Stratis.Bitcoin.Features.Consensus.CoinViews.CachedCoinView.FetchCoinsAsync Cache missed for transaction ID '0cafd4f8f82de7b95cce0414e0927ec0e9d99b732e1b4f767fef16acf5d55ee2'.
[2018-03-15 22:03:09.7865 22] TRACE: Stratis.Bitcoin.Features.Consensus.CoinViews.CachedCoinView.FetchCoinsAsync Cache missed for transaction ID 'bb1118be20504a2da5bfe91dbebf9e5cb50718943c0e0ac9cd177c6d37820e53'.
[2018-03-15 22:03:09.7865 22] TRACE: Stratis.Bitcoin.Features.Consensus.CoinViews.CachedCoinView.FetchCoinsAsync Cache missed for transaction ID '8cb03e80a779006d83bc7314c51738485f83e812b78caae6ceeda410bee838bb'.
[2018-03-15 22:03:09.7865 22] TRACE: Stratis.Bitcoin.Features.Consensus.CoinViews.CachedCoinView.FetchCoinsAsync Cache missed for transaction ID 'c201e26eaf79b55d978eceb97929f937d6132884e6f619720c9b5c6075794560'.
[2018-03-15 22:03:09.7865 22] TRACE: Stratis.Bitcoin.Features.Consensus.CoinViews.CachedCoinView.FetchCoinsAsync Cache missed for transaction ID 'e140e130a0d3c9ff882df542c9cdcc974e4087266c3b993d276d55829a066583'.
[2018-03-15 22:03:09.7865 22] TRACE: Stratis.Bitcoin.Features.Consensus.CoinViews.CachedCoinView.FetchCoinsAsync Cache missed for transaction ID 'd0609c6558b64c556bcc35c8b937f5f8a530b158cbbf2a1ca372dbedf408f7e2'.
[2018-03-15 22:03:09.7865 22] TRACE: Stratis.Bitcoin.Features.Consensus.CoinViews.CachedCoinView.FetchCoinsAsync Cache missed for transaction ID '03946d6c2d217ce9efc1e3d77535562ee7509f63f910600d8abca9390273edfe'.
[2018-03-15 22:03:09.7865 22] TRACE: Stratis.Bitcoin.Features.Consensus.CoinViews.CachedCoinView.FetchCoinsAsync Cache missed for transaction ID '2b9450e1054e98356cd885ef8f09d32a0f3661dfea5576cfea2696850df2b9bb'.
[2018-03-15 22:03:09.7865 22] TRACE: Stratis.Bitcoin.Features.Consensus.CoinViews.CachedCoinView.FetchCoinsAsync Cache missed for transaction ID '729ac0b6348915fd9be1981929632bbdec8cd5caffa2756907cfb46a60bceffb'.
[2018-03-15 22:03:09.7865 22] TRACE: Stratis.Bitcoin.Features.Consensus.CoinViews.CachedCoinView.FetchCoinsAsync Cache missed for transaction ID 'c355683c8efdbd50f09a81c49b74d4c5b586f1c8da424caf50b84e96a28447f4'.
[2018-03-15 22:03:09.7865 22] TRACE: Stratis.Bitcoin.Features.Consensus.CoinViews.CachedCoinView.FetchCoinsAsync Cache missed for transaction ID '76ae9d3a2e1405ce796c6a8a7cab4edc34afee88b747229ddcb244c116de534b'.
[2018-03-15 22:03:09.7865 22] TRACE: Stratis.Bitcoin.Features.Consensus.CoinViews.CachedCoinView.FetchCoinsAsync Cache missed for transaction ID '160a54504d8e778260835eddb49d661f25f1ec2d5e2a9d5fc6550a5189e6712f'.
[2018-03-15 22:03:09.7865 22] TRACE: Stratis.Bitcoin.Features.Consensus.CoinViews.CachedCoinView.FetchCoinsAsync Cache missed for transaction ID 'c27f17bcad4cd64bf0b3f25f77c6e1d80a34cbac317d924be43cf1ba1cba9b0a'.
[2018-03-15 22:03:09.7865 22] TRACE: Stratis.Bitcoin.Features.Consensus.CoinViews.CachedCoinView.FetchCoinsAsync Cache missed for transaction ID '30ed711eaea8144f40b6f26ccfb96e0f984b0d903d77b7ce87e4cfe3c1948f00'.
[2018-03-15 22:03:09.7865 22] TRACE: Stratis.Bitcoin.Features.Consensus.CoinViews.CachedCoinView.FetchCoinsAsync Cache missed for transaction ID '64ad756f5569d5b85e7bdd3483d5cddebb7411b7822cfc6b1728e59f66172d91'.
[2018-03-15 22:03:09.7865 22] TRACE: Stratis.Bitcoin.Features.Consensus.CoinViews.CachedCoinView.FetchCoinsAsync Cache missed for transaction ID 'a2e37ed66705544c1cdd75ab0d436ab928f9bb24b815a71c25b5d12aa1d8bc5e'.
[2018-03-15 22:03:09.7865 22] TRACE: Stratis.Bitcoin.Features.Consensus.CoinViews.CachedCoinView.FetchCoinsAsync Cache missed for transaction ID 'eacbf28dc21b75becf7e392868570e4898f6d7f02d6677823ccaccd10c5e4e0f'.
[2018-03-15 22:03:09.7865 22] TRACE: Stratis.Bitcoin.Features.Consensus.CoinViews.CachedCoinView.FetchCoinsAsync Cache missed for transaction ID '2e656ef0a68cb3d40a24e2dd549407793e672926acc2f44b411737f2876bfee6'.
[2018-03-15 22:03:09.7865 22] TRACE: Stratis.Bitcoin.Features.Consensus.CoinViews.CachedCoinView.FetchCoinsAsync Cache missed for transaction ID '168a6dd45b8b32b6474c3eb4b53e3cfc5a28daf575ea1a60e4adefffbe29eae0'.
[2018-03-15 22:03:09.7865 22] TRACE: Stratis.Bitcoin.Features.Consensus.CoinViews.CachedCoinView.FetchCoinsAsync 33 cache missed transaction needs to be loaded from underlaying coinview.
[2018-03-15 22:03:09.7865 22] TRACE: Stratis.Bitcoin.Features.Consensus.CoinViews.DBreezeCoinView+<>c__DisplayClass12_0.<FetchCoinsAsync>b__0 (txIds.Length:33)
[2018-03-15 22:03:09.8033 22] TRACE: Stratis.Bitcoin.Features.Consensus.CoinViews.DBreezeCoinView+<>c__DisplayClass12_0.<FetchCoinsAsync>b__0 (-):*.BlockHash='dedd0ab07be41dbf17cbb60f337fb64f7d8236b9a59c183ee4d76086707ad768',*.UnspentOutputs.Length=33
[2018-03-15 22:03:09.8033 22] TRACE: Stratis.Bitcoin.Features.Consensus.CoinViews.CachedCoinView+<FetchCoinsAsync>d__30.MoveNext (-):*.BlockHash='dedd0ab07be41dbf17cbb60f337fb64f7d8236b9a59c183ee4d76086707ad768',*.UnspentOutputs.Length=33
So I suspect the issue is somewhere where its trying to read from the cache, throws and exception and it wrongly kills the asyncloop?
From the the MSDN article:
Remarks
LockRecursionException is thrown for several reasons, including the following:
If a thread tries to enter an instance of ReaderWriterLockSlim recursively, but the instance does not support recursion.
If a thread tries to enter a ReaderWriterLockSlim instance in write mode or upgradeable mode when the thread initially entered the lock in read mode. This represents a potential deadlock and therefore is not allowed.
If allowing one more level of recursion would exceed the maximum value for the internal storage counter used to track recursions. This limit is so large that applications should never encounter it.
We are getting a Recursive read lock acquisitions not allowed in this mode.
exception, so somewhere in the code we are trying to enter a LockRead twice.
This issue also appears to be affecting the AzureIndexer (see https://github.com/stratisproject/BlockExplorer/issues/51):
This happened to me 2 times. Last time when it happened I was using build created at 22.02.2018 from the latest master.
Here is the log:
So we can see here that a reorg happened that rewinded consensus by 1 block. After that headers tip get's updated.
Because block store didn't handle a reorg we can tell that
this.signals.SignalBlock(blockValidationContext.Block);
was never called (otherwise block store would reorg itself following the consensus). So something happened after the reorg so the new block never got through the consensus loop for some reason.After that bug consensus never recovers so manual node restart is required.
I suggest adding logging for the consensus loop and block store and running the node until the bug happens again. That way we will have an idea of why this is happening.