oxen-io / oxen-core

Oxen core repository, containing oxend and oxen cli wallets
https://oxen.io
Other
317 stars 120 forks source link

Checkpointing consensus failures #742

Open jagerman opened 5 years ago

jagerman commented 5 years ago

If a reorg arrives which is checkpointed but the current chain is relative long and not checkpointed some really strange behaviour happens that results in a failure for nodes to switch to the checkpointed chain.

Last night the testnet was mining an alt chain that was separate from the chain that all the service nodes were on. (This was caused by the difficulty bug, which doesn't seem to be solved!). So the situation was:

The SN continually communicated this checkpoint to nodes, but they apparently ignored it because:

Quorum state for height: 86184 was not cached in daemon!

At this point I started mining on the SN chain and restarted a node that was on the non-SN chain. It immediately reorg'ed from the non-SN chain to the SN chain:

###### REORGANIZE on height: 86184 of 86351, checkpoint is found in alternative chain on height 86184

and because this was a large chain, had to do a full SN rescan:

Recalculating service nodes list, scanning blockchain from height 3

then the weird stuff happens:

2019-07-13 13:36:07.107 [P2P1]  INFO    global  src/cryptonote_core/blockchain.cpp:1953 ----- BLOCK ADDED AS ALTERNATIVE ON HEIGHT 86184
id: <d1ca09ae53d91ce38032febcaee00b88b121f4d10b8ed09a84239677901835dd>
PoW:    <563b2ac5fa2d2c3416a52aaa47e343456e8081a047c6e55ab6c2a6cddd4b0000>
difficulty: 91108
2019-07-13 13:36:07.118 [P2P1]  INFO    global  src/cryptonote_core/blockchain.cpp:1941     ###### REORGANIZE on height: 86184 of 86184 with cum_difficulty 8149790482
 alternative blockchain size: 2 with cum_difficulty 8149881320
2019-07-13 13:36:07.172 [P2P1]  ERROR   blockchain  src/cryptonote_core/blockchain.cpp:1898 insertion of new alternative block returned as it already exists
2019-07-13 13:36:07.172 [P2P1]  ERROR   blockchain  src/cryptonote_core/blockchain.cpp:1124 Failed to push ex-main chain blocks to alternative chain 

Those errors do not look innocuous. But then it continues to REORG itself back to the non-SN chain, for some reason doing a one block reorg for each block:

2019-07-13 13:36:07.191 [P2P1]  INFO    global  src/cryptonote_core/blockchain.cpp:1144 REORGANIZE SUCCESS! on height: 86184, new blockchain size: 86186
2019-07-13 13:36:07.202 [P2P1]  INFO    global  src/cryptonote_core/blockchain.cpp:1941 ###### REORGANIZE on height: 86186 of 86185 with cum_difficulty 8149881320
 alternative blockchain size: 1 with cum_difficulty 8149969482
2019-07-13 13:36:07.234 [P2P1]  INFO    global  src/cryptonote_core/blockchain.cpp:1144 REORGANIZE SUCCESS! on height: 86186, new blockchain size: 86187
2019-07-13 13:36:07.246 [P2P1]  INFO    global  src/cryptonote_core/blockchain.cpp:1941 ###### REORGANIZE on height: 86187 of 86186 with cum_difficulty 8149969482
 alternative blockchain size: 1 with cum_difficulty 8150053687
2019-07-13 13:36:07.278 [P2P1]  INFO    global  src/cryptonote_core/blockchain.cpp:1144 REORGANIZE SUCCESS! on height: 86187, new blockchain size: 86188
2019-07-13 13:36:07.290 [P2P1]  INFO    global  src/cryptonote_core/blockchain.cpp:1941 ###### REORGANIZE on height: 86188 of 86187 with cum_difficulty 8150053687
 alternative blockchain size: 1 with cum_difficulty 8150137620^[[0m
2019-07-13 13:36:07.321 [P2P1]  INFO    global  src/cryptonote_core/blockchain.cpp:1144 REORGANIZE SUCCESS! on height: 86188, new blockchain size: 86189
2019-07-13 13:36:07.333 [P2P1]  INFO    global  src/cryptonote_core/blockchain.cpp:1941 ###### REORGANIZE on height: 86189 of 86188 with cum_difficulty 8150137620
 alternative blockchain size: 1 with cum_difficulty 8150217391

which continues all the way up to:

2019-07-13 13:36:14.737 [P2P1]  INFO    global  src/cryptonote_core/blockchain.cpp:1144 REORGANIZE SUCCESS! on height: 86350, new blockchain size: 86351
2019-07-13 13:36:14.749 [P2P1]  INFO    global  src/cryptonote_core/blockchain.cpp:1941 ###### REORGANIZE on height: 86351 of 86350 with cum_difficulty 8157923766
 alternative blockchain size: 1 with cum_difficulty 8157944872
2019-07-13 13:36:14.783 [P2P1]  INFO    global  src/cryptonote_core/blockchain.cpp:1144 REORGANIZE SUCCESS! on height: 86351, new blockchain size: 86352
2019-07-13 13:36:14.803 [P2P1]  INFO    global  src/cryptonote_core/blockchain.cpp:1144 REORGANIZE SUCCESS! on height: 86184, new blockchain size: 86352
2019-07-13 13:36:14.803 [P2P1]  WARNING checkpoints src/checkpoints/checkpoints.cpp:58  CHECKPOINT FAILED FOR HEIGHT 86184. EXPECTED HASH <7792aafa97f765db3a4f5a275f9fe1f81afb5d04358444c03adcf9a88a4c3bd2>GIVEN HASH: <d1ca09ae53d91ce38032febcaee00b88b121f4d10b8ed09a84239677901835dd>
2019-07-13 13:36:14.803 [P2P1]  ERROR   blockchain  src/cryptonote_core/blockchain.cpp:4367 Local blockchain failed to pass a checkpoint in: update_checkpoint, rolling back!
2019-07-13 13:36:15.652 [P2P1]  INFO    global  src/cryptonote_core/service_node_list.cpp:1928  Service node data loaded successfully, height: 86352
2019-07-13 13:36:15.652 [P2P1]  INFO    global  src/cryptonote_core/service_node_list.cpp:1929  25 nodes and 32 rollback events loaded.
2019-07-13 13:36:15.652 [P2P1]  WARNING service_nodes   src/cryptonote_core/service_node_list.cpp:85    Recalculating service nodes list, scanning blockchain from height 3
...
2019-07-13 13:36:18.695 [P2P1]  WARNING service_nodes   src/cryptonote_core/service_node_list.cpp:118   Done recalculating service nodes list

Then after the recalculation we get some other WARNINGs (which I am pointing out here because they might be related to the difficulty issue but I have not investigated):

2019-07-13 13:36:18.781 [RPC1]  WARNING blockchain.db.lmdb  src/blockchain_db/lmdb/db_lmdb.cpp:80   Attempt to get cumulative difficulty from height 86351 failed -- difficulty not in db
2019-07-13 13:36:18.781 [RPC0]  WARNING blockchain.db.lmdb  src/blockchain_db/lmdb/db_lmdb.cpp:80   Attempt to get cumulative difficulty from height 86351 failed -- difficulty not in db

and then we sync back to the alt chain again:

2019-07-13 13:36:27.251 [P2P4]  INFO    global  src/cryptonote_protocol/cryptonote_protocol_handler.inl:1483    Synced 86282/86352 (99%, 70 left)
2019-07-13 13:36:29.504 [P2P8]  INFO    global  src/cryptonote_protocol/cryptonote_protocol_handler.inl:1483    Synced 86352/86352
2019-07-13 13:36:29.504 [P2P8]  INFO    global  src/cryptonote_protocol/cryptonote_protocol_handler.inl:2140    SYNCHRONIZED OK
2019-07-13 13:41:03.041 [P2P1]  INFO    global  src/cryptonote_core/blockchain.cpp:1933 ###### REORGANIZE on height: 86184 of 86352, checkpoint is found in alternative chain on height 86184
...
2019-07-13 13:41:03.858 [P2P1]  WARNING service_nodes   src/cryptonote_core/service_node_list.cpp:85    Recalculating service nodes list, scanning blockchain from height 3

This same process keeps repeating over and over forever.

The only thing that stopped this infinite loop was that the cumulative difficulty on the SN chain eventually overtook the cumulative difficulty on the non-SN chain at which point everything reorged to the checkpointed SN chain.


This all suggests some problems in the checkpointing implementation that need to be addressed:

jagerman commented 5 years ago

One other thing I notice is that a checkpoint goes out for the current top block immediately. This seems too soon:

- BBCBB
   `BBBBCBB
       `BBBBCBB
           `BBBBCBB
               ` ... etc.

If checkpoint is a consensus component so that every node ranks (valid) chains lexicographically according to:

  1. number of checkpoints
  2. cumulative difficulty

then this attack would not work: nodes would always prefer a chain with one checkpoint to one with zero, even if the one with zero has more work.