Testnet doesn't always replay properly

theoreticalbts commented 6 years ago

Create a new one-node private (empty) testnet
(a) steemd produces blocks 1-145
Ctrl+C
(b) steemd produces blocks 125-156 (expected, as first shutdown throws away non-irreversible blocks)
Ctrl+C
rm -f shared_memory.bin
(c) Produce blocks 1-41
In another terminal window, while the above production is running, use curl to check database_api.get_dynamic_global_properties, it returned {"head_block_number":37, "last_irreversible_block_num":16}
Ctrl+C
(d) steemd --replay-blockchain produces blocks 136-147

That this sequence of events can occur reveals several bugs:

The blockchain directory should have a special store for saving reversible blocks on shutdown and re-pushing them on startup. Because it does not, the starting block in run (b) is less than the starting block in run (a).
The fact that run (c) started at block 1 means steemd apparently does not check for the situation where block_log exists but shared_memory.bin does not. One reasonable response is to automatically replay in this situation. Another reasonable response is to exit(1) if the user did not specify --replay-blockchain or --resync-blockchain. A less reasonable, but still "correct", response is to rm block_log and resync. A totally buggy, incorrect response is the current behavior, which is apparently to start overwriting block_log without deleting/truncating it.
The fact that run (d) successfully replayed is, at first glance, not too surprising since presumably the new blocks are the same size. But consider the block times: Blocks 1-16 were irreversible in run (c), so they should have been written into the block log by run (c). But blocks after c's LIB (which cannot be greater than 41, since (c) only produced 41 blocks) can only come from run (b) or possibly run (a). Which means they would have earlier times. Which means run (d) should have failed due to non-increasing timestamps. Which it did not. Which means I can't explain why run (d) didn't fail during replay.

Without looking at the code, my best guess is that it's some race condition with the OS filesystem. So for reference, here is the environment (running on a manually provisioned AWS EC2 instance):

$ cat /etc/issue
Ubuntu 16.04.3 LTS \n \l

$ cat /proc/version
Linux version 4.4.0-1039-aws (buildd@lcy01-02) (gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.4) ) #48-Ubuntu SMP Wed Oct 11 15:15:01 UTC 2017

Here's the log:

programs/steemd/steemd --data-dir=$HOME/data/mytest --enable-stale-production --required-participation=0
...
714010ms witness_plugin.cpp:507        block_production_loo ] Generated block #140 with timestamp 2018-07-30T15:11:54 at time 2018-07-30T15:11:54
717010ms witness_plugin.cpp:507        block_production_loo ] Generated block #141 with timestamp 2018-07-30T15:11:57 at time 2018-07-30T15:11:57
720008ms witness_plugin.cpp:507        block_production_loo ] Generated block #142 with timestamp 2018-07-30T15:12:00 at time 2018-07-30T15:12:00
723007ms witness_plugin.cpp:507        block_production_loo ] Generated block #143 with timestamp 2018-07-30T15:12:03 at time 2018-07-30T15:12:03
726007ms witness_plugin.cpp:507        block_production_loo ] Generated block #144 with timestamp 2018-07-30T15:12:06 at time 2018-07-30T15:12:06
729006ms witness_plugin.cpp:507        block_production_loo ] Generated block #145 with timestamp 2018-07-30T15:12:09 at time 2018-07-30T15:12:09
730989ms webserver_plugin.cpp:171      operator()           ] http io service exit
730990ms p2p_plugin.cpp:693            plugin_shutdown      ] Shutting down P2P Plugin
730990ms p2p_plugin.cpp:696            plugin_shutdown      ] P2P Plugin: checking handle_block and handle_transaction activity
730990ms p2p_plugin.cpp:734            plugin_shutdown      ] P2P Plugin: checking handle_block and handle_transaction activity
730991ms p2p_plugin.cpp:738            plugin_shutdown      ] Waiting for p2p_thread quit
730991ms p2p_plugin.cpp:740            plugin_shutdown      ] p2p_thread quit done
730991ms chain_plugin.cpp:532          plugin_shutdown      ] closing chain database
731001ms chain_plugin.cpp:535          plugin_shutdown      ] database closed successfully
exited cleanly
$ programs/steemd/steemd --data-dir=$HOME/data/mytest --enable-stale-production --required-participation=0
...
736988ms witness_plugin.cpp:695        plugin_startup       ] Launching block production for 1 witnesses.
736988ms witness_plugin.cpp:706        plugin_startup       ] witness plugin:  plugin_startup() end
736988ms webserver_plugin.cpp:156      operator()           ] start processing http thread
736988ms webserver_plugin.cpp:166      operator()           ] start listening for http requests
738001ms database.cpp:3009             check_free_memory    ] Free memory is now 23G. Current block number: 125
738002ms witness_plugin.cpp:507        block_production_loo ] Generated block #125 with timestamp 2018-07-30T15:12:18 at time 2018-07-30T15:12:18
741011ms witness_plugin.cpp:507        block_production_loo ] Generated block #126 with timestamp 2018-07-30T15:12:21 at time 2018-07-30T15:12:21
744010ms witness_plugin.cpp:507        block_production_loo ] Generated block #127 with timestamp 2018-07-30T15:12:24 at time 2018-07-30T15:12:24
747009ms witness_plugin.cpp:507        block_production_loo ] Generated block #128 with timestamp 2018-07-30T15:12:27 at time 2018-07-30T15:12:27
750008ms witness_plugin.cpp:507        block_production_loo ] Generated block #129 with timestamp 2018-07-30T15:12:30 at time 2018-07-30T15:12:30
753006ms witness_plugin.cpp:507        block_production_loo ] Generated block #130 with timestamp 2018-07-30T15:12:33 at time 2018-07-30T15:12:33
756005ms witness_plugin.cpp:507        block_production_loo ] Generated block #131 with timestamp 2018-07-30T15:12:36 at time 2018-07-30T15:12:36
759003ms witness_plugin.cpp:507        block_production_loo ] Generated block #132 with timestamp 2018-07-30T15:12:39 at time 2018-07-30T15:12:39
762011ms witness_plugin.cpp:507        block_production_loo ] Generated block #133 with timestamp 2018-07-30T15:12:42 at time 2018-07-30T15:12:42
765010ms witness_plugin.cpp:507        block_production_loo ] Generated block #134 with timestamp 2018-07-30T15:12:45 at time 2018-07-30T15:12:45
768008ms witness_plugin.cpp:507        block_production_loo ] Generated block #135 with timestamp 2018-07-30T15:12:48 at time 2018-07-30T15:12:48
771007ms witness_plugin.cpp:507        block_production_loo ] Generated block #136 with timestamp 2018-07-30T15:12:51 at time 2018-07-30T15:12:51
774006ms witness_plugin.cpp:507        block_production_loo ] Generated block #137 with timestamp 2018-07-30T15:12:54 at time 2018-07-30T15:12:54
777004ms witness_plugin.cpp:507        block_production_loo ] Generated block #138 with timestamp 2018-07-30T15:12:57 at time 2018-07-30T15:12:57
780003ms witness_plugin.cpp:507        block_production_loo ] Generated block #139 with timestamp 2018-07-30T15:13:00 at time 2018-07-30T15:13:00
783011ms witness_plugin.cpp:507        block_production_loo ] Generated block #140 with timestamp 2018-07-30T15:13:03 at time 2018-07-30T15:13:03
786010ms witness_plugin.cpp:507        block_production_loo ] Generated block #141 with timestamp 2018-07-30T15:13:06 at time 2018-07-30T15:13:06
789008ms witness_plugin.cpp:507        block_production_loo ] Generated block #142 with timestamp 2018-07-30T15:13:09 at time 2018-07-30T15:13:09
792006ms witness_plugin.cpp:507        block_production_loo ] Generated block #143 with timestamp 2018-07-30T15:13:12 at time 2018-07-30T15:13:12
795005ms witness_plugin.cpp:507        block_production_loo ] Generated block #144 with timestamp 2018-07-30T15:13:15 at time 2018-07-30T15:13:15
798003ms witness_plugin.cpp:507        block_production_loo ] Generated block #145 with timestamp 2018-07-30T15:13:18 at time 2018-07-30T15:13:18
801002ms witness_plugin.cpp:507        block_production_loo ] Generated block #146 with timestamp 2018-07-30T15:13:21 at time 2018-07-30T15:13:21
804012ms witness_plugin.cpp:507        block_production_loo ] Generated block #147 with timestamp 2018-07-30T15:13:24 at time 2018-07-30T15:13:24
807010ms witness_plugin.cpp:507        block_production_loo ] Generated block #148 with timestamp 2018-07-30T15:13:27 at time 2018-07-30T15:13:27
810008ms witness_plugin.cpp:507        block_production_loo ] Generated block #149 with timestamp 2018-07-30T15:13:30 at time 2018-07-30T15:13:30
813008ms witness_plugin.cpp:507        block_production_loo ] Generated block #150 with timestamp 2018-07-30T15:13:33 at time 2018-07-30T15:13:33
816007ms witness_plugin.cpp:507        block_production_loo ] Generated block #151 with timestamp 2018-07-30T15:13:36 at time 2018-07-30T15:13:36
819006ms witness_plugin.cpp:507        block_production_loo ] Generated block #152 with timestamp 2018-07-30T15:13:39 at time 2018-07-30T15:13:39
822006ms witness_plugin.cpp:507        block_production_loo ] Generated block #153 with timestamp 2018-07-30T15:13:42 at time 2018-07-30T15:13:42
825004ms witness_plugin.cpp:507        block_production_loo ] Generated block #154 with timestamp 2018-07-30T15:13:45 at time 2018-07-30T15:13:45
828003ms witness_plugin.cpp:507        block_production_loo ] Generated block #155 with timestamp 2018-07-30T15:13:48 at time 2018-07-30T15:13:48
831012ms witness_plugin.cpp:507        block_production_loo ] Generated block #156 with timestamp 2018-07-30T15:13:51 at time 2018-07-30T15:13:51
833673ms webserver_plugin.cpp:171      operator()           ] http io service exit
833673ms p2p_plugin.cpp:693            plugin_shutdown      ] Shutting down P2P Plugin
833673ms p2p_plugin.cpp:696            plugin_shutdown      ] P2P Plugin: checking handle_block and handle_transaction activity
833673ms p2p_plugin.cpp:734            plugin_shutdown      ] P2P Plugin: checking handle_block and handle_transaction activity
833674ms p2p_plugin.cpp:738            plugin_shutdown      ] Waiting for p2p_thread quit
833674ms p2p_plugin.cpp:740            plugin_shutdown      ] p2p_thread quit done
833675ms chain_plugin.cpp:532          plugin_shutdown      ] closing chain database
833685ms chain_plugin.cpp:535          plugin_shutdown      ] database closed successfully
exited cleanly
$ rm -f ~/data/mytest/blockchain/shared_memory.bin
$ programs/steemd/steemd --data-dir=$HOME/data/mytest --enable-stale-production --required-participation=0
902486ms smt_test_plugin.cpp:250       smt_test_plugin      ] Caught exception:
{"code":10,"name":"assert_exception","message":"Assert Exception","stack":[{"context":{"level":"error","file":"asset.cpp","line":309,"method":"validate","hostname":"","timestamp":"2018-07-30T15:15:02"},"format":"(nai_data_digits >= SMT_MIN_NAI) & (nai_data_digits <= SMT_MAX_NAI) & (nai_1bit == 0x10) & (nai_decimal_places <= STEEM_ASSET_MAX_DECIMALS): Cannot determine space for asset ${n}","data":{"n":0}}]}
902502ms witness_plugin.cpp:639        plugin_initialize    ] Initializing witness plugin
902502ms account_by_key_plugin.cpp:249 plugin_initialize    ] Initializing account_by_key plugin
902504ms webserver_plugin.cpp:310      plugin_initialize    ] configured with 32 thread pool size
902506ms webserver_plugin.cpp:319      plugin_initialize    ] configured http to listen on 127.0.0.1:19080
------------------------------------------------------

            STARTING TEST NETWORK

------------------------------------------------------
initminer public key: TST6LLegbAgLAy28EHrffBVuANFWcFgmqRMW13wBmTExqFE9SCkg4
initminer private key: 5JNHfZYKGaomSFvd4NUdQ9qMcEAC43kujbfjueTHpVapX1Kzq2n
blockchain version: 0.21.0
------------------------------------------------------
902507ms main.cpp:121                  main                 ] Backtrace on segfault is enabled.
902507ms chain_plugin.cpp:394          plugin_startup       ] Starting chain with shared_file_size: 25769803776 bytes
902507ms chain_plugin.cpp:503          plugin_startup       ] Opening shared memory from /home/ubuntu/data/mytest/blockchain
903032ms block_log.cpp:142             open                 ] Log is nonempty
903032ms block_log.cpp:151             open                 ] Index is nonempty
903032ms chain_plugin.cpp:526          plugin_startup       ] Started on blockchain with 0 blocks
903035ms p2p_plugin.cpp:672            operator()           ] P2P node listening at 0.0.0.0:1776
903035ms p2p_plugin.cpp:674            plugin_startup       ] P2P Plugin started
903035ms witness_plugin.cpp:690        plugin_startup       ] witness plugin:  plugin_startup() begin
903035ms witness_plugin.cpp:695        plugin_startup       ] Launching block production for 1 witnesses.

********************************
*                              *
*   ------- NEW CHAIN ------   *
*   -   Welcome to Steem!  -   *
*   ------------------------   *
*                              *
********************************

903035ms witness_plugin.cpp:706        plugin_startup       ] witness plugin:  plugin_startup() end
903036ms webserver_plugin.cpp:156      operator()           ] start processing http thread
903036ms webserver_plugin.cpp:166      operator()           ] start listening for http requests
904000ms witness_plugin.cpp:525        block_production_loo ] Not producing block because node didn't wake up within 500ms of the slot time.
905000ms witness_plugin.cpp:525        block_production_loo ] Not producing block because node didn't wake up within 500ms of the slot time.
906002ms database.cpp:3060             _apply_block         ] Processing 19 genesis hardforks
906002ms database.cpp:4602             apply_hardfork       ] HARDFORK 1 at block 0
906002ms database.cpp:4602             apply_hardfork       ] HARDFORK 2 at block 0
906002ms database.cpp:4602             apply_hardfork       ] HARDFORK 3 at block 0
906002ms database.cpp:4602             apply_hardfork       ] HARDFORK 4 at block 0
906002ms database.cpp:4602             apply_hardfork       ] HARDFORK 5 at block 0
906002ms database.cpp:4602             apply_hardfork       ] HARDFORK 6 at block 0
906003ms database.cpp:4602             apply_hardfork       ] HARDFORK 7 at block 0
906003ms database.cpp:4602             apply_hardfork       ] HARDFORK 8 at block 0
906003ms database.cpp:4602             apply_hardfork       ] HARDFORK 9 at block 0
906003ms database.cpp:4602             apply_hardfork       ] HARDFORK 10 at block 0
906003ms database.cpp:4602             apply_hardfork       ] HARDFORK 11 at block 0
906003ms database.cpp:4602             apply_hardfork       ] HARDFORK 12 at block 0
906004ms database.cpp:4602             apply_hardfork       ] HARDFORK 13 at block 0
906004ms database.cpp:4602             apply_hardfork       ] HARDFORK 14 at block 0
906004ms database.cpp:4602             apply_hardfork       ] HARDFORK 15 at block 0
906004ms database.cpp:4602             apply_hardfork       ] HARDFORK 16 at block 0
906004ms database.cpp:4602             apply_hardfork       ] HARDFORK 17 at block 0
906004ms database.cpp:4602             apply_hardfork       ] HARDFORK 18 at block 0
906004ms database.cpp:4602             apply_hardfork       ] HARDFORK 19 at block 0
944041ms database.cpp:3009             check_free_memory    ] Free memory is now 23G. Current block number: 1
944042ms witness_plugin.cpp:507        block_production_loo ] Generated block #1 with timestamp 2018-07-30T15:15:06 at time 2018-07-30T15:15:06
945005ms witness_plugin.cpp:507        block_production_loo ] Generated block #2 with timestamp 2018-07-30T15:15:45 at time 2018-07-30T15:15:45
948005ms witness_plugin.cpp:507        block_production_loo ] Generated block #3 with timestamp 2018-07-30T15:15:48 at time 2018-07-30T15:15:48
951004ms witness_plugin.cpp:507        block_production_loo ] Generated block #4 with timestamp 2018-07-30T15:15:51 at time 2018-07-30T15:15:51
954003ms witness_plugin.cpp:507        block_production_loo ] Generated block #5 with timestamp 2018-07-30T15:15:54 at time 2018-07-30T15:15:54
957012ms witness_plugin.cpp:507        block_production_loo ] Generated block #6 with timestamp 2018-07-30T15:15:57 at time 2018-07-30T15:15:57
960010ms witness_plugin.cpp:507        block_production_loo ] Generated block #7 with timestamp 2018-07-30T15:16:00 at time 2018-07-30T15:16:00
963009ms witness_plugin.cpp:507        block_production_loo ] Generated block #8 with timestamp 2018-07-30T15:16:03 at time 2018-07-30T15:16:03
966007ms witness_plugin.cpp:507        block_production_loo ] Generated block #9 with timestamp 2018-07-30T15:16:06 at time 2018-07-30T15:16:06
969006ms witness_plugin.cpp:507        block_production_loo ] Generated block #10 with timestamp 2018-07-30T15:16:09 at time 2018-07-30T15:16:09
972004ms witness_plugin.cpp:507        block_production_loo ] Generated block #11 with timestamp 2018-07-30T15:16:12 at time 2018-07-30T15:16:12
975003ms witness_plugin.cpp:507        block_production_loo ] Generated block #12 with timestamp 2018-07-30T15:16:15 at time 2018-07-30T15:16:15
978002ms witness_plugin.cpp:507        block_production_loo ] Generated block #13 with timestamp 2018-07-30T15:16:18 at time 2018-07-30T15:16:18
981011ms witness_plugin.cpp:507        block_production_loo ] Generated block #14 with timestamp 2018-07-30T15:16:21 at time 2018-07-30T15:16:21
984011ms witness_plugin.cpp:507        block_production_loo ] Generated block #15 with timestamp 2018-07-30T15:16:24 at time 2018-07-30T15:16:24
987010ms witness_plugin.cpp:507        block_production_loo ] Generated block #16 with timestamp 2018-07-30T15:16:27 at time 2018-07-30T15:16:27
990008ms witness_plugin.cpp:507        block_production_loo ] Generated block #17 with timestamp 2018-07-30T15:16:30 at time 2018-07-30T15:16:30
993008ms witness_plugin.cpp:507        block_production_loo ] Generated block #18 with timestamp 2018-07-30T15:16:33 at time 2018-07-30T15:16:33
996007ms witness_plugin.cpp:507        block_production_loo ] Generated block #19 with timestamp 2018-07-30T15:16:36 at time 2018-07-30T15:16:36
999006ms witness_plugin.cpp:507        block_production_loo ] Generated block #20 with timestamp 2018-07-30T15:16:39 at time 2018-07-30T15:16:39
1002006ms witness_plugin.cpp:507        block_production_loo ] Generated block #21 with timestamp 2018-07-30T15:16:42 at time 2018-07-30T15:16:42
1005004ms witness_plugin.cpp:507        block_production_loo ] Generated block #22 with timestamp 2018-07-30T15:16:45 at time 2018-07-30T15:16:45
1008003ms witness_plugin.cpp:507        block_production_loo ] Generated block #23 with timestamp 2018-07-30T15:16:48 at time 2018-07-30T15:16:48
1011003ms witness_plugin.cpp:507        block_production_loo ] Generated block #24 with timestamp 2018-07-30T15:16:51 at time 2018-07-30T15:16:51
1014002ms witness_plugin.cpp:507        block_production_loo ] Generated block #25 with timestamp 2018-07-30T15:16:54 at time 2018-07-30T15:16:54
1017011ms witness_plugin.cpp:507        block_production_loo ] Generated block #26 with timestamp 2018-07-30T15:16:57 at time 2018-07-30T15:16:57
1020010ms witness_plugin.cpp:507        block_production_loo ] Generated block #27 with timestamp 2018-07-30T15:17:00 at time 2018-07-30T15:17:00
1023010ms witness_plugin.cpp:507        block_production_loo ] Generated block #28 with timestamp 2018-07-30T15:17:03 at time 2018-07-30T15:17:03
1026008ms witness_plugin.cpp:507        block_production_loo ] Generated block #29 with timestamp 2018-07-30T15:17:06 at time 2018-07-30T15:17:06
1029008ms witness_plugin.cpp:507        block_production_loo ] Generated block #30 with timestamp 2018-07-30T15:17:09 at time 2018-07-30T15:17:09
1032007ms witness_plugin.cpp:507        block_production_loo ] Generated block #31 with timestamp 2018-07-30T15:17:12 at time 2018-07-30T15:17:12
1035005ms witness_plugin.cpp:507        block_production_loo ] Generated block #32 with timestamp 2018-07-30T15:17:15 at time 2018-07-30T15:17:15
1038005ms witness_plugin.cpp:507        block_production_loo ] Generated block #33 with timestamp 2018-07-30T15:17:18 at time 2018-07-30T15:17:18
1041003ms witness_plugin.cpp:507        block_production_loo ] Generated block #34 with timestamp 2018-07-30T15:17:21 at time 2018-07-30T15:17:21
1044012ms witness_plugin.cpp:507        block_production_loo ] Generated block #35 with timestamp 2018-07-30T15:17:24 at time 2018-07-30T15:17:24
1047012ms witness_plugin.cpp:507        block_production_loo ] Generated block #36 with timestamp 2018-07-30T15:17:27 at time 2018-07-30T15:17:27
1050011ms witness_plugin.cpp:507        block_production_loo ] Generated block #37 with timestamp 2018-07-30T15:17:30 at time 2018-07-30T15:17:30
1053011ms witness_plugin.cpp:507        block_production_loo ] Generated block #38 with timestamp 2018-07-30T15:17:33 at time 2018-07-30T15:17:33
1056011ms witness_plugin.cpp:507        block_production_loo ] Generated block #39 with timestamp 2018-07-30T15:17:36 at time 2018-07-30T15:17:36
1059010ms witness_plugin.cpp:507        block_production_loo ] Generated block #40 with timestamp 2018-07-30T15:17:39 at time 2018-07-30T15:17:39
1062010ms witness_plugin.cpp:507        block_production_loo ] Generated block #41 with timestamp 2018-07-30T15:17:42 at time 2018-07-30T15:17:42
1063567ms webserver_plugin.cpp:171      operator()           ] http io service exit
1063568ms p2p_plugin.cpp:693            plugin_shutdown      ] Shutting down P2P Plugin
1063568ms p2p_plugin.cpp:696            plugin_shutdown      ] P2P Plugin: checking handle_block and handle_transaction activity
1063568ms p2p_plugin.cpp:734            plugin_shutdown      ] P2P Plugin: checking handle_block and handle_transaction activity
1063569ms p2p_plugin.cpp:738            plugin_shutdown      ] Waiting for p2p_thread quit
1063569ms p2p_plugin.cpp:740            plugin_shutdown      ] p2p_thread quit done
1063569ms chain_plugin.cpp:532          plugin_shutdown      ] closing chain database
1063579ms chain_plugin.cpp:535          plugin_shutdown      ] database closed successfully
exited cleanly
$ programs/steemd/steemd --data-dir=$HOME/data/mytest --enable-stale-production --required-participation=0 --replay-blockchain
1066523ms smt_test_plugin.cpp:250       smt_test_plugin      ] Caught exception:
{"code":10,"name":"assert_exception","message":"Assert Exception","stack":[{"context":{"level":"error","file":"asset.cpp","line":309,"method":"validate","hostname":"","timestamp":"2018-07-30T15:17:46"},"format":"
(nai_data_digits >= SMT_MIN_NAI) & (nai_data_digits <= SMT_MAX_NAI) & (nai_1bit == 0x10) & (nai_decimal_places <= STEEM_ASSET_MAX_DECIMALS): Cannot determine space for asset ${n}","data":{"n":0}}]}
1066539ms witness_plugin.cpp:639        plugin_initialize    ] Initializing witness plugin
1066539ms account_by_key_plugin.cpp:249 plugin_initialize    ] Initializing account_by_key plugin
1066541ms webserver_plugin.cpp:310      plugin_initialize    ] configured with 32 thread pool size
1066543ms webserver_plugin.cpp:319      plugin_initialize    ] configured http to listen on 127.0.0.1:19080
------------------------------------------------------

            STARTING TEST NETWORK

------------------------------------------------------
initminer public key: TST6LLegbAgLAy28EHrffBVuANFWcFgmqRMW13wBmTExqFE9SCkg4
initminer private key: 5JNHfZYKGaomSFvd4NUdQ9qMcEAC43kujbfjueTHpVapX1Kzq2n
blockchain version: 0.21.0
------------------------------------------------------
1066544ms main.cpp:121                  main                 ] Backtrace on segfault is enabled.
1066544ms chain_plugin.cpp:394          plugin_startup       ] Starting chain with shared_file_size: 25769803776 bytes
1066544ms chain_plugin.cpp:474          plugin_startup       ] Replaying blockchain on user request.
1066544ms database.cpp:186              reindex              ] Reindexing Blockchain
1067065ms block_log.cpp:142             open                 ] Log is nonempty
1067065ms block_log.cpp:151             open                 ] Index is nonempty
1067065ms database.cpp:194              reindex              ] Replaying blocks...
1067065ms database.cpp:3060             _apply_block         ] Processing 19 genesis hardforks
1067066ms database.cpp:4602             apply_hardfork       ] HARDFORK 1 at block 0
1067066ms database.cpp:4602             apply_hardfork       ] HARDFORK 2 at block 0
1067066ms database.cpp:4602             apply_hardfork       ] HARDFORK 3 at block 0
1067066ms database.cpp:4602             apply_hardfork       ] HARDFORK 4 at block 0
1067066ms database.cpp:4602             apply_hardfork       ] HARDFORK 5 at block 0
1067066ms database.cpp:4602             apply_hardfork       ] HARDFORK 6 at block 0
1067066ms database.cpp:4602             apply_hardfork       ] HARDFORK 7 at block 0
1067066ms database.cpp:4602             apply_hardfork       ] HARDFORK 8 at block 0
1067066ms database.cpp:4602             apply_hardfork       ] HARDFORK 9 at block 0
1067067ms database.cpp:4602             apply_hardfork       ] HARDFORK 10 at block 0
1067067ms database.cpp:4602             apply_hardfork       ] HARDFORK 11 at block 0
1067067ms database.cpp:4602             apply_hardfork       ] HARDFORK 12 at block 0
1067067ms database.cpp:4602             apply_hardfork       ] HARDFORK 13 at block 0
1067067ms database.cpp:4602             apply_hardfork       ] HARDFORK 14 at block 0
1067067ms database.cpp:4602             apply_hardfork       ] HARDFORK 15 at block 0
1067067ms database.cpp:4602             apply_hardfork       ] HARDFORK 16 at block 0
1067067ms database.cpp:4602             apply_hardfork       ] HARDFORK 17 at block 0
1067067ms database.cpp:4602             apply_hardfork       ] HARDFORK 18 at block 0
1067067ms database.cpp:4602             apply_hardfork       ] HARDFORK 19 at block 0
1106043ms database.cpp:246              reindex              ] Done reindexing, elapsed time: 38.97802599999999984 sec
1106044ms chain_plugin.cpp:526          plugin_startup       ] Started on blockchain with 135 blocks
1106046ms p2p_plugin.cpp:672            operator()           ] P2P node listening at 0.0.0.0:1776
1106046ms p2p_plugin.cpp:674            plugin_startup       ] P2P Plugin started
1106046ms witness_plugin.cpp:690        plugin_startup       ] witness plugin:  plugin_startup() begin
1106046ms witness_plugin.cpp:695        plugin_startup       ] Launching block production for 1 witnesses.
1106046ms witness_plugin.cpp:706        plugin_startup       ] witness plugin:  plugin_startup() end
1106046ms webserver_plugin.cpp:156      operator()           ] start processing http thread
1106046ms webserver_plugin.cpp:166      operator()           ] start listening for http requests
1107002ms database.cpp:3009             check_free_memory    ] Free memory is now 23G. Current block number: 136
1107002ms witness_plugin.cpp:507        block_production_loo ] Generated block #136 with timestamp 2018-07-30T15:18:27 at time 2018-07-30T15:18:27
1110010ms witness_plugin.cpp:507        block_production_loo ] Generated block #137 with timestamp 2018-07-30T15:18:30 at time 2018-07-30T15:18:30
1113009ms witness_plugin.cpp:507        block_production_loo ] Generated block #138 with timestamp 2018-07-30T15:18:33 at time 2018-07-30T15:18:33
1116008ms witness_plugin.cpp:507        block_production_loo ] Generated block #139 with timestamp 2018-07-30T15:18:36 at time 2018-07-30T15:18:36
1119006ms witness_plugin.cpp:507        block_production_loo ] Generated block #140 with timestamp 2018-07-30T15:18:39 at time 2018-07-30T15:18:39
1122005ms witness_plugin.cpp:507        block_production_loo ] Generated block #141 with timestamp 2018-07-30T15:18:42 at time 2018-07-30T15:18:42
1125004ms witness_plugin.cpp:507        block_production_loo ] Generated block #142 with timestamp 2018-07-30T15:18:45 at time 2018-07-30T15:18:45
1128002ms witness_plugin.cpp:507        block_production_loo ] Generated block #143 with timestamp 2018-07-30T15:18:48 at time 2018-07-30T15:18:48
1131011ms witness_plugin.cpp:507        block_production_loo ] Generated block #144 with timestamp 2018-07-30T15:18:51 at time 2018-07-30T15:18:51
1134009ms witness_plugin.cpp:507        block_production_loo ] Generated block #145 with timestamp 2018-07-30T15:18:54 at time 2018-07-30T15:18:54
1137007ms witness_plugin.cpp:507        block_production_loo ] Generated block #146 with timestamp 2018-07-30T15:18:57 at time 2018-07-30T15:18:57
1140006ms witness_plugin.cpp:507        block_production_loo ] Generated block #147 with timestamp 2018-07-30T15:19:00 at time 2018-07-30T15:19:00
1141858ms webserver_plugin.cpp:171      operator()           ] http io service exit
1141859ms p2p_plugin.cpp:693            plugin_shutdown      ] Shutting down P2P Plugin
1141859ms p2p_plugin.cpp:696            plugin_shutdown      ] P2P Plugin: checking handle_block and handle_transaction activity
1141859ms p2p_plugin.cpp:734            plugin_shutdown      ] P2P Plugin: checking handle_block and handle_transaction activity
1141860ms p2p_plugin.cpp:738            plugin_shutdown      ] Waiting for p2p_thread quit
1141860ms p2p_plugin.cpp:740            plugin_shutdown      ] p2p_thread quit done
1141860ms chain_plugin.cpp:532          plugin_shutdown      ] closing chain database
1141869ms chain_plugin.cpp:535          plugin_shutdown      ] database closed successfully
exited cleanly

In another terminal window, result of get_dynamic_global_properties during the second execution which produced from block 1:

{"jsonrpc":"2.0","result":{"id":0,"head_block_number":37,"head_block_id":"0000002552cd3d6864e5528cbb9ca50480a2fc69","time":"2018-07-30T15:17:30","current_witness":"initminer","total_pow":"18446744073709551615","num_pow_witnesses":0,"virtual_supply":{"amount":"250000164825","precision":3,"nai":"@@000000021"},"current_supply":{"amount":"250000164825","precision":3,"nai":"@@000000021"},"confidential_supply":{"amount":"0","precision":3,"nai":"@@000000021"},"current_sbd_supply":{"amount":"0","precision":3,"nai":"@@000000013"},"confidential_sbd_supply":{"amount":"0","precision":3,"nai":"@@000000013"},"total_vesting_fund_steem":{"amount":"100334","precision":3,"nai":"@@000000021"},"total_vesting_shares":{"amount":"4091341","precision":6,"nai":"@@000000037"},"total_reward_fund_steem":{"amount":"0","precision":3,"nai":"@@000000021"},"total_reward_shares2":"0","pending_rewarded_vesting_shares":{"amount":"0","precision":6,"nai":"@@000000037"},"pending_rewarded_vesting_steem":{"amount":"0","precision":3,"nai":"@@000000021"},"sbd_interest_rate":1000,"sbd_print_rate":10000,"maximum_block_size":131072,"current_aslot":27119150,"recent_slots_filled":"140771848093695","participation_count":36,"last_irreversible_block_num":16,"vote_power_reserve_rate":10,"delegation_return_period":3600,"reverse_auction_seconds":1800,"available_account_subsidies":0,"sbd_stop_percent":500,"sbd_start_percent":200,"smt_creation_fee":{"amount":"1000000","precision":3,"nai":"@@000000013"}},"id":1}

mvandeberg commented 6 years ago

The blockchain directory should have a special store for saving reversible blocks on shutdown and re-pushing them on startup. Because it does not, the starting block in run (b) is less than the starting block in run (a)

I do not view this as a bug. This is 100% intentional. The block_db implementation of BitShares 2.0 and early versions of Steemd included reversible blocks. It was considerably more buggy, especially during unclean shutdown. The block log is not perfect, but it is considerably more reliable than previous implementations. I recommend this behavior simply be a quirk of running a test net node and no additional development be done. It is obviously not an issue on main net. The only time in which it could be a problem is on a chain halt. The blockchain makes no guarantees about the inclusion of a particular block or transaction. Losing the head block due to reversibility on a halt is no worse than losing pending transactions.

The fact that run (c) started at block 1 means steemd apparently does not check for the situation where block_log exists but shared_memory.bin does not. One reasonable response is to automatically replay in this situation. Another reasonable response is to exit(1) if the user did not specify --replay-blockchain or --resync-blockchain. A less reasonable, but still "correct", response is to rm block_log and resync. A totally buggy, incorrect response is the current behavior, which is apparently to start overwriting block_log without deleting/truncating it.

I agree this should be changed. Instead of automatically replaying, let's print a message prompting for a replay and exit.

The fact that run (d) successfully replayed is, at first glance, not too surprising since presumably the new blocks are the same size. But consider the block times: Blocks 1-16 were irreversible in run (c), so they should have been written into the block log by run (c). But blocks after c's LIB (which cannot be greater than 41, since (c) only produced 41 blocks) can only come from run (b) or possibly run (a). Which means they would have earlier times. Which means run (d) should have failed due to non-increasing timestamps. Which it did not. Which means I can't explain why run (d) didn't fail during replay.

https://github.com/steemit/steem/blob/85808eff9a9d0c40040c1bda0410a090f254fdd1/libraries/chain/database.cpp#L3555-L3576

if( log_head_num < dpo.last_irreversible_block_num )

The new chain would have appended new blocks past 135 on the new chain. This is problematic because the block log would contain blocks from two different blockchains. However, the replay check for a non-existent shared_memory.bin will prevent this from occurring.

theoreticalbts commented 6 years ago

I do not view this as a bug. This is 100% intentional. The block_db implementation of BitShares 2.0 and early versions of Steemd included reversible blocks. It was considerably more buggy, especially during unclean shutdown. The block log is not perfect, but it is considerably more reliable than previous implementations.

I'm aware of that. It sounds like you're thinking of an implementation like the following:

On exit, no blocks are popped.
Therefore, the undo data structures must be properly persisted to disk.

I agree such an implementation would need a lot of work to get the second step right, as there's currently no code which does that. And I agree that previous implementations are probably pretty buggy if they just write blocks and hope for the best, without bothering to worry about the tedious business of persisting the undo data structures.

Instead, I'm thinking more along the lines of:

On exit, blocks are popped to LIB (as we currently do).
Instead of throwing away the popped blocks (as we currently do), write them out into a disk file (the "secondary block store").
On startup, check if the secondary block store exists.
If it does, push blocks from the secondary block store.
If the secondary block store's blocks aren't strict continuation of the head block, treat it as a normal forking situation.

This is much less error-prone. We're just setting up another source of blocks which follow the normal path. It has the effect of rebuilding the undo structure in memory without actually having to write the code to persist it to disk.

unclean shutdown

The current code doesn't work, at all, with unclean shutdown (i.e. power cut or SIGKILL). In my experience, unclean shutdown always results in corruption which requires truncating the block log to a known-good prefix and replaying.

mvandeberg commented 6 years ago

On exit, blocks are popped to LIB (as we currently do).

This is not what currently happens. We pop back to LIB on startup because requiring a pop to LIB on shutdown is going to be violated on an unclean shutdown. Because undo is write ahead, we are less likely to run into a situation where we cannot recover state based even on an incomplete undo state.

The current code doesn't work, at all, with unclean shutdown (i.e. power cut or SIGKILL). In my experience, unclean shutdown always results in corruption which requires truncating the block log to a known-good prefix and replaying.

This may be your experience, but it is not representative of mine or many of the node operators. To the best of my memory, the only scenario when we are not recoverable is when shared memory runs out of space. I've done many SIGKILLs and have successfully recovered. Perhaps OS X is better at writing files and persisting mmap than Linux?

steemit / steem

Testnet doesn't always replay properly #2676