Closed OnedgeLee closed 11 months ago
Since IEnumerable.First
will cause ArgumentException
on null source, InvalidOperationException
on empty source, so seems okay.
Every block has nullable PreviousHash
, so seems okay for this, too.
The most probable case is that the element of blocks
is not a type of Tuple<Block, ..., ...>
.
Seems like this problem has been caused from generating empty Branch
after full iteration has been ended.
When I see the log, before above problem occurs, GetBlocksAsync()
tries to fetch from 0 hashes, and right before it, it fetches smaller chunk than chunk size we set.
[2023-06-29T04:59:20.5288099Z ] [Swarm] Trying to download 93 block(s) from "0xa6e2052C389dd76E3B72356a454A2c40E91A9BC9.Unspecified/nld-nc-1.ninodes.com:31234."...
[2023-06-29T04:59:21.9151275Z Debug] [Swarm] Received replies from "0xa6e2052C389dd76E3B72356a454A2c40E91A9BC9.Unspecified/nld-nc-1.ninodes.com:31234."
[2023-06-29T04:59:21.9151682Z ] [Swarm] Received 186 blocks from "0xa6e2052C389dd76E3B72356a454A2c40E91A9BC9.Unspecified/nld-nc-1.ninodes.com:31234."
[2023-06-29T04:59:22.3407983Z ] [Swarm] Downloaded 93 block(s) from "0xa6e2052C389dd76E3B72356a454A2c40E91A9BC9.Unspecified/nld-nc-1.ninodes.com:31234."
[2023-06-29T04:59:22.4370240Z ] [Swarm] Trying to download 0 block(s) from "0x117297F1CF9376Cd2472F685392220ED960D97Db.Unspecified/backup-nc-1.ninodes.com:31234."...
[2023-06-29T04:59:22.4372711Z Error] [Swarm] Unexpected exception occured during PullBlocksAsync()
System.NullReferenceException: Object reference not set to an instance of an object.
at Libplanet.Net.Swarm.PullBlocksAsync(List`1 peersWithBlockExcerpt, Int32 chunkSize, IProgress`1 progress, CancellationToken cancellationToken) in /app/Lib9c/.Libplanet/Libplanet.Net/Swarm.BlockSync.cs:line 210
But this won't cause above error, since this empty chunk won't be queued since iteration will be exited immediately.
Previous assumption was wrong.
Exception happens on GetBlockCommit()
. link
This happens because store have branchpoint block, but tip of blockchain is not the branchpoint.
Swarm.PullBlocksAsync()
get branchpoint by store, so it won't make any error. but when it tries to get BlockCommit
, it fetch block from blockchain, and fails, since chain does not have that block.
When I query with branchpoint that failed to fetch, failed node returns correct block. But when I query tip, it's previous block of branchpoint. The former get the block from store, and the latter get the block from blockchain. So, the problem is that the block is on the store, but not on the blockchain.
{
"data": {
"blockQuery": {
"block": {
"index": 7146015,
"previousBlock": {
"hash": "73662156867acc4d4c04c0cc095e8e59dc79e4aab6b70049eaf3a169ee7c4a53"
}
}
}
}
}
{
"data": {
"blockQuery": {
"blocks": [
{
"hash": "73662156867acc4d4c04c0cc095e8e59dc79e4aab6b70049eaf3a169ee7c4a53",
"index": 7146014
}
]
}
}
}
[04:09:44 DBG] [Swarm] 1101040927/562539105: Sending a GetBlockHashesMsg message with locator [73662156867acc4d4c04c0cc095e8e59dc79e4aab6b70049eaf3a169ee7c4a53, ...] (stop: null)...
[04:09:44 DBG] [Swarm] 1101040927/562539105: Received a BlockHashesMsg message with an offset index 7146014 (total 500 hashes)
[04:09:44 DBG] [Swarm] 486290445/1195604464: Sending a GetBlockHashesMsg message with locator [00be11ff9ca41cc3793953579c7cd5ad3ec7dd037a27803ff09fe70d1d69ff02, ...] (stop: null)...
[04:09:44 DBG] [Swarm] 486290445/1195604464: Received a BlockHashesMsg message with an offset index 7146513 (total 339 hashes)
[04:09:44 DBG] 8bcdf19947231be4337c211045e26be094364b4b3c11d515769084d27c78c650,4d4aa5112f606036c5cab0c14b5efad9a75ee5df02ce5505e6f8d92be5ad38ee,3380b0c20e294775aff44fc33f32ffd81ccb412a03ee7d805fc8c33f81290269
[04:09:44 DBG] Downloaded block #7146016 8bcdf19947231be4337c211045e26be094364b4b3c11d515769084d27c78c650 from 0x7655122899Ccd2cC87B38A35D7158c9516504119.Unspecified/9c-internal-validator-2.nine-chronicles.com:31234.
[04:09:44 DBG] [Swarm] Got #7146016 8bcdf19947231be4337c211045e26be094364b4b3c11d515769084d27c78c650 from 0x7655122899Ccd2cC87B38A35D7158c9516504119.Unspecified/9c-internal-validator-2.nine-chronicles.com:31234.
[04:09:44 DBG] Completed block #7146016 8bcdf19947231be4337c211045e26be094364b4b3c11d515769084d27c78c650 from {"block_hash":"8bcdf19947231be4337c211045e26be094364b4b3c11d515769084d27c78c650","height":7146016,"round":0,"votes":[{"validator_public_key":"021ebc027706c9b7fdb03bc212657241197c4f7d4f122cbb66f347bee3bfd39551","flag":2,"timestamp":"07/03/2023 01:56:30 \u002B00:00"},{"validator_public_key":"0260972c353ba9b1e630d7b488ef0e9250a86286fbc8541e1bcca82f1a50cf8012","flag":2,"timestamp":"07/03/2023 01:56:30 \u002B00:00"},{"validator_public_key":"029a46496641787a06db32d83dfea3b50ebb681fae9bbf4e60b41ee91d4024965b","flag":2,"timestamp":"07/03/2023 01:56:30 \u002B00:00"},{"validator_public_key":"03310066ad080de4bea6042163cade4ab777a1ccb45abfada0973352b34ca0b497","flag":0,"timestamp":"07/03/2023 01:56:37 \u002B00:00"}]}
We have to download from #7146015
, but for now, #7146016
#7146015
was not demanded by below link
if (blockCompletion.Demand(hash))
{
totalBlocksToDownload++;
}
Seems like this block hash has been omitted on below link
var demandBlockHashes = GetDemandBlockHashes(
BlockChain,
peersWithBlockExcerpt,
chunkSize,
progress,
cancellationToken
).WithCancellation(cancellationToken);
But according to its log, it's already downloaded. So I suspect it's downloaded, but discarded by some reason.
I've received a tip.
NineChronicles.Headless.Executable.Commands.ChainCommand.Truncate()
."7385495
.7385394
.7385495
.
[10:10:59 ERR] [Swarm] Unexpected exception occured during PullBlocksAsync()
System.Collections.Generic.KeyNotFoundException: The given hash[ead9480e2ac1ec194e3650c6333b94259f325d35df25ca76e0e55cb84ed3045a] was not found in this chain.
at Libplanet.Blockchain.BlockChain.get_Item(BlockHash& blockHash) in /Users/yang/projects/headless/Lib9c/.Libplanet/Libplanet/Blockchain/BlockChain.cs:line 313
at Libplanet.Blockchain.BlockChain.GetBlockCommit(BlockHash blockHash) in /Users/yang/projects/headless/Lib9c/.Libplanet/Libplanet/Blockchain/BlockChain.cs:line 953
at Libplanet.Net.Swarm.PullBlocksAsync(List`1 peersWithBlockExcerpt, Int32 chunkSize, IProgress`1 progress, CancellationToken cancellationToken) in /Users/yang/projects/headless/Lib9c/.Libplanet/Libplanet.Net/Swarm.BlockSync.cs:line 210
[10:10:59 INF] [BlockCandidateTable] There are 1 branches in BlockCandidateTable
[10:10:59 DBG] [Swarm] PullBlocksAsync() has finished successfully
[10:10:59 INF] [Swarm] Consuming branch with root #7385496 182e57e9e5bdffe40cfc3ea58ed2b6c701f200ecd6ba0833e14611284640abc3 and tip #7385643 fd79f29d9546bbd2fbde8c1982932bbd6ad3317e7da55a155dcfa3ca9deb9a68
[10:10:59 DBG] [Swarm] BlockCandidateProcess() starts to append; current tip is #7385394 1cfcddac89d76eb45bdf0c1f140f2a55be009740831c1a1e47b029306f2f3674
[10:10:59 ERR] [Swarm] BlockCandidateProcess() has failed to append blocks
System.Collections.Generic.KeyNotFoundException: The given hash[ead9480e2ac1ec194e3650c6333b94259f325d35df25ca76e0e55cb84ed3045a] was not found in this chain.
query { nodeStatus { tip { index } } }
indicates tip as 7385394
from here, tipHeader
is 7385394
and root
is 7385496
.
What is clear, it's not the problem with +-1 indexing, it's much likely to be related with gap between Store
and BlockChain
.
I also looked into this issue. As @OnedgeLee had researched, it is the gap between Store
and BlockChain
. In detail, I think it is the problem of using different block storage sources than the gap between Store
and BlockChain
.
Let's do a root cause analysis.
Its symptom is throwing KeyNotFoundException
:
[10:10:59 ERR] [Swarm] BlockCandidateProcess() has failed to append blocks
System.Collections.Generic.KeyNotFoundException: The given hash[ead9480e2ac1ec194e3650c6333b94259f325d35df25ca76e0e55cb84ed3045a] was not found in this chain.
The source thrown the exception is here:
It tries to get the block from BlockChain
when it doesn't exist in newBlocks
. Then what are newBlocks
? They are blocks fetched by BlockCompletion
, PullBlockAsync
across the network.
And I checked that Swarm
receives incorrect block hashes (i.e., not including the block hash) as GetBlockHashesMsg
's response. But it works well when debugging.
But it dropped when demanding the block hash. Because it calls BlockChain.Store.ContainsBlock
through Satifies()
, BlockCompletion._completionPredicate
.
So if the block already exists in your store, it will occur forever 🙀
Just delete the block from your store.
IStore store = storeType.CreateStore(storePath);
var bh = BlockHash.FromString(blockHash);
store.DeleteBlock(bh);
I think it has to get a block from the store because it checks whether the block exists in the store, not the blockchain. If it is not intended.
catch (ArgumentNullException)
{
newTip = BlockChain.Store.GetBlock(bPrev);
}
catch (InvalidOperationException)
{
newTip = BlockChain.Store.GetBlock(bPrev);
}
But it was the incorrect way... It should replace _completionPredicate
with BlockChain.ContainsBlock
During preloading,
BlockCandidateProcess()
fails to append blocks. Log is attached on the below :Error has been thrown on
FindBranchpoint()
, code linkParameter
newBlocks
have to contain blocks betweenoldTip
andnewTip
but it seems not. It means, thatBranch candidate
ofBlockCandidateProcess()
does not contain full blocks starts from current tip. This may caused by failure onPullBlocksAsync()
.On investigation, below line on
PullBlocksAsync()
seems to causeNullReferenceError
. code link