omgnetwork / elixir-omg

OMG-Network repository of Watcher and Watcher Info
https://omg.network
Apache License 2.0
211 stars 59 forks source link

Child Chain contract ahead of db occurance in development #996

Closed InoMurko closed 5 years ago

InoMurko commented 5 years ago

Development CH CH has been crashing

[error] module=OMG.ChildChain.BlockQueue.Server function=log_eth_node_error/0 ⋅Ethereum operation failed, additional diagnostics: %{"admin_nodeInfo" => {:error, %{"code" => -32601, "message" => "Method not found"}}, "parity_enode" => {:ok, "enode://f76efefa833c56a34f22c8b202467c68067bf5e5f57bf2393c4692166e70e5f313c8fd3e4975d39a1383c26f50df2377e9ac3c544c388c720a2519e7bdd4016a@10.20.3.64:30303"}, "personal_listWallets" => {:error, %{"code" => -32601, "message" => "Method not found"}}}⋅
2019-09-23 08:59:05.651 [error] module=gen_server function=error_info/7 ⋅GenServer OMG.ChildChain.BlockQueue.Server terminating
** (MatchError) no match of right hand side value: {:error, :contract_ahead_of_db}
    (omg_child_chain) lib/omg_child_chain/block_queue.ex:80: OMG.ChildChain.BlockQueue.Server.handle_continue/2
    (stdlib) gen_server.erl:637: :gen_server.try_dispatch/4
    (stdlib) gen_server.erl:388: :gen_server.loop/7
    (stdlib) proc_lib.erl:249: :proc_lib.init_p_do_apply/3
Last message: {:continue, :setup}⋅
2019-09-23 08:59:05.651 [info] module=OMG.ChildChain.BlockQueue.Server function=handle_continue/2 ⋅Starting Elixir.OMG.ChildChain.BlockQueue.Server service.⋅
2019-09-23 08:59:05.655 [info] module=OMG.ChildChain.BlockQueue.Server function=handle_continue/2 ⋅Starting BlockQueue at parent_height: 5139351, parent_start: 5109103, mined_child_block: 11000, stored_child_top_block: 0⋅
2019-09-23 08:59:05.656 [info] module=OMG.ChildChain.BlockQueue.Server function=handle_continue/2 ⋅Starting BlockQueue, top_mined_hash: "0x525c309712d906779f0ca26eb75bb95b544cfb548b233e7ae1e431428ace9872"⋅
2019-09-23 08:59:05.656 [error] module=OMG.ChildChain.BlockQueue.Server function=log_init_error/1 ⋅The child chain might have not been wiped clean when starting a child chain from scratch: %{config: [contract_addr: "0xd638eadf352fc942e2e259bf89e96b4fde56db60", authority_addr: "0x3e2005d1e1dce59f8c3822b792cf5af475e4150c", txhash_contract: "0x802c0d7d8c4ae142f15a6a62dfcf6004fbba677ae3c17eb0fea2a8a50cae37a3"], known_hashes: [], mined_num: 11000, parent_height: 5139351, stored_child_top_num: 0}. Check README.MD and follow the setting up child chain.⋅

https://app.datadoghq.com/logs?cols=core_host%2Ccore_service&event&from_ts=1569022996730&index=main&live=false&messageDisplay=inline&query=pod_name%3Achildchain-0+service%3Achild_chain+host%3A%2A%2Aci%2A%2A&stream_sort=desc&to_ts=1569022996740

Last deploy was https://app.datadoghq.com/event/event?id=5113673714418771116

InoMurko commented 5 years ago

At the moment we restarted development and we're observing possible occurrence.

pdobacz commented 5 years ago

https://app.datadoghq.com/logs?cols=&event&from_ts=1568930400000&index=main&live=false&messageDisplay=inline&query=pod_name%3Achildchain-0+service%3Achild_chain+host%3A%2A%2Aci%2A%2A&stream_sort=desc&to_ts=1569022996824

This log link covers the time when it broke. On the 20th it submitted it's last block 11000. On 21st it got a SIGTERM (redeploy) and the data seems to be gone:


Sep 21 01:43:16.741 | 2019-09-20 23:43:16.741 [error] module=gen_server function=error_info/7 ⋅GenServer OMG.ChildChain.BlockQueue.Server terminating
-- | --
  | Sep 21 01:43:16.741 | 2019-09-20 23:43:16.740 [error] module=OMG.ChildChain.BlockQueue.Server function=log_eth_node_error/0 ⋅Ethereum operation failed, additional diagnostics: %{"admin_nodeInfo" => {:error, %{"code" => -32601, "message" => "Method not found"}}, "parity_enode" => {:ok, "enode://f76efefa833c56a34f22c8b202467c68067bf5e5f57bf2393c4692166e70e5f313c8fd3e4975d39a1383c26f50df2377e9ac3c544c388c720a2519e7bdd4016a@10.20.3.64:30303"}, "personal_listWallets" => {:error, %{"code" => -32601, "message" => "Method not found"}}}⋅
  | Sep 21 01:43:16.738 | 2019-09-20 23:43:16.738 [error] module=OMG.ChildChain.BlockQueue.Server function=log_init_error/1 ⋅The child chain might have not been wiped clean when starting a child chain from scratch: %{config: [authority_addr: "0x3e2005d1e1dce59f8c3822b792cf5af475e4150c", contract_addr: "0xd638eadf352fc942e2e259bf89e96b4fde56db60", txhash_contract: "0x802c0d7d8c4ae142f15a6a62dfcf6004fbba677ae3c17eb0fea2a8a50cae37a3"], known_hashes: [], mined_num: 11000, parent_height: 5125608, stored_child_top_num: 0}. Check README.MD and follow the setting up child chain.⋅
  | Sep 21 01:43:16.738 | 2019-09-20 23:43:16.738 [info] module=OMG.ChildChain.BlockQueue.Server function=handle_continue/2 ⋅Starting BlockQueue, top_mined_hash: "0x525c309712d906779f0ca26eb75bb95b544cfb548b233e7ae1e431428ace9872"⋅
  | Sep 21 01:43:16.737 | 2019-09-20 23:43:16.737 [info] module=OMG.ChildChain.BlockQueue.Server function=handle_continue/2 ⋅Starting BlockQueue at parent_height: 5125608, parent_start: 5109103, mined_child_block: 11000, stored_child_top_block: 0⋅
  | Sep 21 01:42:10.488 | [os_mon] cpu supervisor port (cpu_sup): Erlang has closed
  | Sep 21 01:42:10.487 | 2019-09-20 23:42:10.487 [info] module=OMG.Status.Metric.StatsdMonitor function=handle_event/2 ⋅inspect(Elixir.OMG.Status.Metric.StatsdMonitor) got event: {:clear_alarm, :system_memory_high_watermark}. Ignoring.⋅
  | Sep 21 01:42:10.487 | 2019-09-20 23:42:10.487 [info] module=OMG.Eth.EthereumClientMonitor function=handle_event/2 ⋅Eth client monitor got event: {:clear_alarm, :system_memory_high_watermark}. Ignoring.⋅
  | Sep 21 01:42:10.487 | 2019-09-20 23:42:10.487 [info] module=OMG.ChildChain.Monitor function=handle_event/2 ⋅Monitor got event: {:clear_alarm, :system_memory_high_watermark}. Ignoring.⋅
  | Sep 21 01:42:10.487 | 2019-09-20 23:42:10.487 [info] module=OMG.ChildChainRPC.Plugs.Health function=handle_event/2 ⋅Health RPC plug got event: {:clear_alarm, :system_memory_high_watermark}. Ignoring.⋅
  | Sep 21 01:42:10.486 | [os_mon] memory supervisor port (memsup): Erlang has closed
  | Sep 21 01:42:10.428 | ⋅
  | Sep 21 01:42:10.428 | 2019-09-20 23:42:10.424 [info] ⋅SIGTERM received - shutting down
  | Sep 20 03:51:35.455 | 2019-09-20 01:51:35.454 [info] module=OMG.State.Core function=deposit/2 ⋅Recognized deposits [%{amount: 12000000000000000000, blknum: 11001, currency: <<80, 179, 144, 167, 201, 53, 104, 10, 37, 41, 5, 45, 30, 82, 104, 134, 179, 227, 219, 34>>, eth_height: 5120353, log_index: 3, owner: <<69, 34, 251, 68, 194, 171, 53, 158, 118, 236, 199, 92, 34, 201, 64, 150, 144, 241, 34, 65>>, root_chain_txhash: <<212, 14, 139, 242, 167, 46, 75, 151, 175, 43, 108, 125, 178, 3, 160, 119, 129, 126, 44, 13, 230, 154, 145, 192, 183, 36, 248, 10, 16, 124, 160, 118>>}]⋅
  | Sep 20 03:45:14.321 | 2019-09-20 01:45:14.320 [info] module=OMG.ChildChain.BlockQueue.Core function=process_submit_result/3 ⋅Submitted %OMG.ChildChain.BlockQueue.Core.BlockSubmission{gas_price: 8609344200, hash: <<82, 92, 48, 151, 18, 217, 6, 119, 159, 12, 162, 110, 183, 91, 185, 91, 84, 76, 251, 84, 139, 35, 62, 122, 225, 228, 49, 66, 138, 206, 152, 114>>, nonce: 11, num: 11000} at: <<248, 43, 174, 81, 230, 162, 48, 214, 50, 25, 130, 115, 224, 250, 200, 188, 19, 119, 89, 110, 190, 38, 228, 145, 122, 62, 62, 143, 164, 244, 46, 144>>⋅
  | Sep 20 03:45:11.447 | 2019-09-20 01:45:11.444 [info] module=OMG.ChildChain.BlockQueue.Server function=handle_info/2 ⋅Enqueuing block num '11000', hash '"0x525c309712d906779f0ca26eb75bb95b544cfb548b233e7ae1e431428ace9872"'⋅
  | Sep 20 03:23:47.861 | 2019-09-20 01:23:47.858 [info] module=OMG.State.Core function=deposit/2 ⋅Recognized deposits [%{amount: 200000000000000000, blknum: 10004, currency: <<0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0>>, eth_height: 5120242, log_index: 1, owner: <<69, 34, 251, 68, 194, 171, 53, 158, 118, 236, 199, 92, 34, 201, 64, 150, 144, 241, 34, 65>>, root_chain_txhash: <<189, 167, 50, 101, 2, 192, 232, 209, 198, 215, 16, 129, 36, 203, 135, 113, 24, 122, 243, 211, 105, 111, 90, 121, 219, 196, 5, 227, 111, 70, 234, 230>>}]⋅
  | Sep 20 03:22:17.834 | 2019-09-20 01:22:17.832 [info] module=OMG.State.Core function=deposit/2 ⋅Recognized deposits [%{amount: 55000000000000000000, blknum: 10003, currency: <<161, 201, 208, 198, 237, 98, 127, 178, 25, 127, 209, 108, 211, 175, 222, 55, 204, 94, 141, 165>>, eth_height: 5120235, log_index: 8, owner: <<69, 34, 251, 68, 194, 171, 53, 158, 118, 236, 199, 92, 34, 201, 64, 150, 144, 241, 34, 65>>, root_chain_txhash: <<178, 165, 154, 11, 136, 129, 44, 45, 182, 141, 135, 140, 215, 92, 124, 56, 107, 122, 16, 104, 96, 214, 227, 211, 42, 201, 143, 5, 17, 37, 60, 162>>}]⋅
  | Sep 20 01:57:29.762 | 2019-09-19 23:57:29.761 [info] module=OMG.State.Core function=deposit/2 ⋅Recognized deposits [%{amount: 500000000000000000, blknum: 10002, currency: <<0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0>>, eth_height: 5119897, log_index: 0, owner: <<69, 34, 251, 68, 194, 171, 53, 158, 118, 236, 199, 92, 34, 201, 64, 150, 144, 241, 34, 65>>, root_chain_txhash: <<21, 66, 143, 31, 167, 118, 197, 193, 243, 111, 87, 123, 101, 141, 42, 3, 196, 155, 55, 59, 52, 160, 11, 200, 148, 137, 121, 190, 146, 185, 169, 254>>}]⋅
  | Sep 20 00:56:29.844 | 2019-09-19 22:56:29.843 [info] module=OMG.State.Core function=deposit/2 ⋅Recognized deposits [%{amount: 20000000000000000000, blknum: 10001, currency: <<127, 21, 17, 112, 142, 81, 163, 8, 142, 78, 133, 5, 241, 101, 35, 48, 6, 104, 71, 110>>, eth_height: 5119653, log_index: 4, owner: <<69, 34, 251, 68, 194, 171, 53, 158, 118, 236, 199, 92, 34, 201, 64, 150, 144, 241, 34, 65>>, root_chain_txhash: <<192, 207, 61, 51, 255, 184, 132, 30, 99, 81, 159, 184, 160, 198, 133, 180, 135, 214, 34, 238, 136, 112, 50, 31, 213, 251, 213, 89, 145, 73, 141, 137>>}]⋅
  | Sep 20 00:53:25.828 | 2019-09-19 22:53:25.804 [info] module=OMG.ChildChain.BlockQueue.Core function=process_submit_result/3 ⋅Submitted %OMG.ChildChain.BlockQueue.Core.BlockSubmission{gas_price: 9565938000, hash: <<87, 198, 129, 42, 199, 131, 204, 249, 203, 231, 38, 81, 187, 49, 228, 155, 33, 132, 158, 19, 159, 221, 80, 250, 240, 101, 240, 212, 80, 2, 38, 117>>, nonce: 10, num: 10000} at: <<229, 2, 124, 201, 218, 62, 238, 252, 205, 11, 135, 109, 2, 254, 93, 134, 119, 122, 192, 107, 82, 170, 172, 219, 234, 226, 67, 206, 142, 48, 38, 45>>⋅
  | Sep 20 00:53:23.445 | 2019-09-19 22:53:23.444 [info] module=OMG.ChildChain.BlockQueue.Server function=handle_info/2 ⋅Enqueuing block num '10000', hash '"0x57c6812ac783ccf9cbe72651bb31e49b21849e139fdd50faf065f0d450022675"'⋅
  | Sep 20 00:38:41.668 | 2019-09-19 22:38:41.666 [info] module=OMG.State.Core function=deposit/2 ⋅Recognized deposits [%{amount: 2

mind the stored_child_top_num: 0 part and that the top_mined_hash is matching the last block enqueued.

Looks like the child chain data went missing after the SIGTERM

kasima commented 5 years ago

Context from Slack: https://omisego.slack.com/archives/C9KNA675J/p1569335714273700

InoMurko commented 5 years ago

I believe we can update this thread now with the root cause: DB_PATH was not mounted correctly on the persistent-volume-control. And after we redeployed Child Chain the database was lost.

@kendricktan pls close when you mitigate the issue! Ktnx