Fixes a bug when dealing with deferred txns which resulted in the txn mgr crashing out and dropping txns.
When txn mgr receives a rejection from a CG member for a submitted txn, it checks the height of the CG member against its own height. If the txn mgr is behind the CG member then the rejection is added to a pending list called deferred rejections.
Each add block event this list is processed and any pending rejection which matches the current height of the block being handled by txn mgr is then processed and applied. As we are processing this list each block it was assumed the txn mgr would never run ahead of the height of a rejection. This assumption was implemented here:
However the assumption was not valid. The log extract below shows the current height of txn mgr:
2022-08-10 18:51:58.036 [debug] <0.19661.1698>@blockchain_txn_mgr:handle_add_block_event:{512,13} received block height: 1472629, updated state block height: 1472629
There is then a crash at the above mentioned line 536 :
022-08-10 18:51:58.037 [debug] <0.19661.1698>@blockchain_txn_mgr:terminate:{464,5} terminating with reason {{badmatch,{[{<0.18651.1882>,<<"-576328476948906414">>......
the binary value <<"-576328476948906414">> is the key of the txn and other logs show this same txn in the deferred list with a height less than the current txn mgr height
blockchain.log:2022-08-10 18:51:58.037 [debug] <0.19661.1698>@blockchain_txn_mgr:terminate:{464,5} terminating with reason {{badmatch,{[{<0.18651.1882>,<<"-576328476948906414">>,{blockchain_txn_routing_v1_pb,1,<<0,241,20,68,146,24,117,226,239,116,53,81,58,29,31,27,15,164,158,50,66,149,106,36,56,57,18,236,93,79,25,64,119>>,{update_xor,{update_xor_pb,4,<<193,92,2,137,236,45,10,145,11,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,133,25,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0>>}},25000,23898,<<48,69,2,32,112,218,195,203,152,15,229,105,124,233,132,239,59,133,159,161,144,74,56,118,137,173,233,108,220,118,44,92,4,68,17,0,2,33,0,239,92,69,202,33,20,110,44,175,29,25,218,117,92,120,142,171,70,55,96,47,99,23,46,139,82,254,112,114,180,106,145>>,0},<<0,178,4,179,76,225,237,59,224,193,22,222,165,41,233,222,129,255,59,22,187,208,174,206,6,33,81,200,173,207,74,205,44>>,{blockchain_txn_info_v1_pb,<<"txn_rejected">>,<<"bad_nonce">>,<<131,100,0,9,117,110,100,101,102,105,110,101,100>>,1472628,0,0,3}},
Note the blockchain_txn_info_v1_pb record at the end and its height value of 1472628, which means the deferred txns were processed with a state height of 1472629 whilst there was a deferred txn with a reject height of 1472628. This should never happen.
However looking through the logs the root cause is clear:
2022-08-10 18:51:20.057 [debug] <0.19661.1698>@blockchain_txn_mgr:handle_add_block_event:{512,13} received block height: 1472627, updated state block height: 1472626
2022-08-10 18:51:52.803 [debug] <0.19661.1698>@blockchain_txn_mgr:handle_add_block_event:{512,13} received block height: 1472628, updated state block height: 1472626
2022-08-10 18:51:58.036 [debug] <0.19661.1698>@blockchain_txn_mgr:handle_add_block_event:{512,13} received block height: 1472629, updated state block height: 1472629
Here we see some add block events prior to the crash.
In the log extracts, the received block height is the height of the current block being processed ( it may or may not be a sync block )
The updated state block height: is that of the txn mgr state height. This state height is buggy and returning inconsistent values.
It is this state height which was being used by process_deferred_rejections and in the above events it was never called with height 1472627 and 1472628. This resulted in rejections with those heights being left in the deferred list which then broke the assumption on line 536 resulting in the crash.
The fix here is to always call process_deferred_rejections with the block height and not the State height and then to be overly cautious remove the assumption on line 536 and allow rejections with a height less than that of the txn mgr height. Any such rejections encountered will be acted up in the same way we act on the Current list
Fixes a bug when dealing with deferred txns which resulted in the txn mgr crashing out and dropping txns.
When txn mgr receives a rejection from a CG member for a submitted txn, it checks the height of the CG member against its own height. If the txn mgr is behind the CG member then the rejection is added to a pending list called deferred rejections.
Each add block event this list is processed and any pending rejection which matches the current height of the block being handled by txn mgr is then processed and applied. As we are processing this list each block it was assumed the txn mgr would never run ahead of the height of a rejection. This assumption was implemented here:
https://github.com/helium/blockchain-core/blob/46f514bc9abaad91507d502e1a34d8f66ada9588/src/transactions/blockchain_txn_mgr.erl#L536
However the assumption was not valid. The log extract below shows the current height of txn mgr:
2022-08-10 18:51:58.036 [debug] <0.19661.1698>@blockchain_txn_mgr:handle_add_block_event:{512,13} received block height: 1472629, updated state block height: 1472629
There is then a crash at the above mentioned line 536 :
022-08-10 18:51:58.037 [debug] <0.19661.1698>@blockchain_txn_mgr:terminate:{464,5} terminating with reason {{badmatch,{[{<0.18651.1882>,<<"-576328476948906414">>......
the binary value
<<"-576328476948906414">>
is the key of the txn and other logs show this same txn in the deferred list with a height less than the current txn mgr heightblockchain.log:2022-08-10 18:51:58.037 [debug] <0.19661.1698>@blockchain_txn_mgr:terminate:{464,5} terminating with reason {{badmatch,{[{<0.18651.1882>,<<"-576328476948906414">>,{blockchain_txn_routing_v1_pb,1,<<0,241,20,68,146,24,117,226,239,116,53,81,58,29,31,27,15,164,158,50,66,149,106,36,56,57,18,236,93,79,25,64,119>>,{update_xor,{update_xor_pb,4,<<193,92,2,137,236,45,10,145,11,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,133,25,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0>>}},25000,23898,<<48,69,2,32,112,218,195,203,152,15,229,105,124,233,132,239,59,133,159,161,144,74,56,118,137,173,233,108,220,118,44,92,4,68,17,0,2,33,0,239,92,69,202,33,20,110,44,175,29,25,218,117,92,120,142,171,70,55,96,47,99,23,46,139,82,254,112,114,180,106,145>>,0},<<0,178,4,179,76,225,237,59,224,193,22,222,165,41,233,222,129,255,59,22,187,208,174,206,6,33,81,200,173,207,74,205,44>>,{blockchain_txn_info_v1_pb,<<"txn_rejected">>,<<"bad_nonce">>,<<131,100,0,9,117,110,100,101,102,105,110,101,100>>,1472628,0,0,3}},
Note the blockchain_txn_info_v1_pb record at the end and its height value of
1472628
, which means the deferred txns were processed with a state height of1472629
whilst there was a deferred txn with a reject height of1472628
. This should never happen.However looking through the logs the root cause is clear:
2022-08-10 18:51:20.057 [debug] <0.19661.1698>@blockchain_txn_mgr:handle_add_block_event:{512,13} received block height: 1472627, updated state block height: 1472626 2022-08-10 18:51:52.803 [debug] <0.19661.1698>@blockchain_txn_mgr:handle_add_block_event:{512,13} received block height: 1472628, updated state block height: 1472626 2022-08-10 18:51:58.036 [debug] <0.19661.1698>@blockchain_txn_mgr:handle_add_block_event:{512,13} received block height: 1472629, updated state block height: 1472629
Here we see some add block events prior to the crash.
In the log extracts, the
received block height
is the height of the current block being processed ( it may or may not be a sync block )The
updated state block height:
is that of the txn mgr state height. This state height is buggy and returning inconsistent values.It is this state height which was being used by
process_deferred_rejections
and in the above events it was never called with height 1472627 and 1472628. This resulted in rejections with those heights being left in the deferred list which then broke the assumption on line 536 resulting in the crash.The fix here is to always call
process_deferred_rejections
with the block height and not the State height and then to be overly cautious remove the assumption on line 536 and allow rejections with a height less than that of the txn mgr height. Any such rejections encountered will be acted up in the same way we act on the Current list