skalenetwork / skaled

Running more than 20 production blockchains, SKALED is Ethereum-compatible, high performance C++ Proof-of-Stake client, tools and libraries. Uses SKALE consensus as a blockchain consensus core. Includes dynamic Oracle. Implements file storage and retrieval as an EVM extension.
https://skale.network
GNU General Public License v3.0
84 stars 40 forks source link

Skaled exited with 0 Exit Code following forceful instance shutdown #1827

Open oleksandrSydorenkoJ opened 8 months ago

oleksandrSydorenkoJ commented 8 months ago

Describe the bug A common issue on OVH instances, rarely on other cloud providers Skaled has several exit codes that allow interaction with the admin through the skale<>admin interface. 0 exit code is reserved only for the graceful container stop. If the skaled has crashed due to external failure, the exit code should be different - 255

Version Skaled 3.17.1

Preconditions 16 active nodes At least 1 schain Medium type Access to 1 instance on OVH

To Reproduce

  1. Force shut down the OVH instance
  2. Wait for 2 hours
  3. Restart the instance

Expected behavior Skaled container should exit with 255 exit code or similar following forceful instance shutdown

Actual state Skaled sometimes exited with the exit code 0.

oleksandrSydorenkoJ commented 8 months ago

Actual for Vultr instance skaled 3.18.0 skalenetwork/admin:2.6.0 node-cli: 2.3.1 STR

  1. Run node reboot and wait for 5-10 minutes - all skaled containers are exited with 0 (exit code)

    • Docker inspect: "StartedAt": "2024-02-26T17:56:58.56449108Z", "FinishedAt": "2024-02-28T12:31:18.24490516Z"
    • Last logs in skaled (no SigTerm or other signals in skaled logs)
      [2024-02-28 12:23:11.900] [26:main] [info] 1585028:RETURNED_CATCHUP_BLOCKS:1:CRT:0
    • Skaled status
      
      -rw-r--r--  1 root root    358 Feb 28 12:23 skaled.status

    $ cat .skale/node_data/schains/breakable-anguished-tania-borealis/skaled.status { "subsystemRunning":{ "SnapshotDownloader": false, "WaitingForTimestamp": false, "Blockchain": false, "Rpc": false }, "exitState":{ "ClearDataDir": false, "StartAgain": false, "StartFromSnapshot": false, "ExitTimeReached": false } }

docker ps -a

da6c39bc8287 skalenetwork/ima:2.1.0 "bash /ima/runner/ru…" 42 hours ago Up 2 minutes skale_ima_worrisome-fortunate-talitha 3e510d8c1447 skalenetwork/schain:3.18.0 "/skaled/skaled --co…" 42 hours ago Exited (0) 2 minutes ago skale_schain_worrisome-fortunate-talitha 49e027ddeecd skalenetwork/ima:2.1.0 "bash /ima/runner/ru…" 42 hours ago Up 2 minutes skale_ima_big-majestic-oval-SKALE adc549f7dcfd skalenetwork/schain:3.18.0 "/skaled/skaled --co…" 42 hours ago Exited (0) 2 minutes ago skale_schain_big-majestic-oval-SKALE a512e2c7a5f1 skalenetwork/ima:2.1.0 "bash /ima/runner/ru…" 42 hours ago Up 2 minutes skale_ima_international-villainous-zaurak a7553e1cfec3 skalenetwork/schain:3.18.0 "/skaled/skaled --co…" 42 hours ago Exited (0) 2 minutes ago skale_schain_international-villainous-zaurak e9e0c3b8a653 skalenetwork/ima:2.1.0 "bash /ima/runner/ru…" 42 hours ago Up 2 minutes skale_ima_ill-informed-friendly-haedi 327a98ddd9c5 skalenetwork/schain:3.18.0 "/skaled/skaled --co…" 42 hours ago Exited (0) 2 minutes ago skale_schain_ill-informed-friendly-haedi 3b69ca8a4db4 skalenetwork/ima:2.1.0 "bash /ima/runner/ru…" 42 hours ago Up 2 minutes skale_ima_rural-colossal-cebalrai 2dc92e177643 skalenetwork/schain:3.18.0 "/skaled/skaled --co…" 42 hours ago Exited (0) 2 minutes ago skale_schain_rural-colossal-cebalrai 7f085451c487 skalenetwork/ima:2.1.0 "bash /ima/runner/ru…" 42 hours ago Up 2 minutes skale_ima_hungry-formal-ascella 52e7a6138422 skalenetwork/schain:3.18.0 "/skaled/skaled --co…" 42 hours ago Exited (0) 2 minutes ago skale_schain_hungry-formal-ascella 597ed6efafe5 skalenetwork/ima:2.1.0 "bash /ima/runner/ru…" 43 hours ago Up 2 minutes skale_ima_breakable-anguished-tania-borealis ba5da36c43f9 skalenetwork/schain:3.18.0 "/skaled/skaled --co…" 43 hours ago Exited (0) 2 minutes ago skale_schain_breakable-anguished-tania-borealis 0ff037dd3821 skalenetwork/ima:2.1.0 "bash /ima/runner/ru…" 43 hours ago Up 2 minutes skale_ima_skale-innocent-nasty 4ae28dadcad7 skalenetwork/schain:3.18.0 "/skaled/skaled --co…" 43 hours ago Exited (0) 2 minutes ago skale_schain_skale-innocent-nasty a166e420a6a6 skalenetwork/admin:2.6.0 "celery -A tools.not…" 8 days ago Up 2 minutes celery 5cc0479b6503 google/cadvisor:latest "/usr/bin/cadvisor -…" 8 days ago Up 2 minutes monitor_cadvisor ba8a163b797c quay.io/prometheus/node-exporter "/bin/node_exporter …" 8 days ago Up 2 minutes monitor_node_exporter be304f3dba5b nginx:1.20.2 "/docker-entrypoint.…" 8 days ago Up 2 minutes skale_nginx 6e5ec1a6f21f skalenetwork/watchdog:2.2.0-stable.0 "uwsgi --ini uwsgi.i…" 8 days ago Up 2 minutes skale_watchdog 6b51f2f89b05 docker.elastic.co/beats/filebeat:7.3.1 "/usr/local/bin/dock…" 8 days ago Up 2 minutes skale_filebeat 9159e3e1f115 skalenetwork/admin:2.6.0 "python3 admin.py" 8 days ago Up Less than a second (health: starting) skale_admin 5587ccf10a73 skalenetwork/transaction-manager:2.2.0 "python3 -m transact…" 8 days ago Up 2 minutes skale_transaction-manager 471f90453a5a skalenetwork/admin:2.6.0 "gunicorn app:app -c…" 8 days ago Up 2 minutes skale_api 7d58650c5fac redis:6.0.10-alpine "docker-entrypoint.s…" 8 days ago Up 2 minutes skale_redis 850dc68311a6 skalenetwork/bounty-agent:2.2.0-stable.0 "python3 bounty_agen…" 8 days ago Up 2 minutes skale_bounty

badrogger commented 4 months ago

Exit with zero should be treatead as the one that ok to restart, because the situation when container is down with zero is common and expected case. We need to have a possiblity to stop skaled in a way which will be ignored by skale-admin for testing purposes. Among other options there are the following:

  1. Make skaled handle special signal to turn off with specific exit code that admin will ignore. It will allow to stop container manually by sending this specific signal.
  2. Enable spcial mode for skale-admin for node-cli that will ignore any exit codes at all.
  3. Introduce special command for node-cli to restart container that will act as docker unless-stopped mode.