skalenetwork / skale-admin

SKALE admin docker container orchestrates all other SKALE Docker containers
https://skale.network
GNU Affero General Public License v3.0
18 stars 4 forks source link

Skale sync admin couldn't re-create skaled on indexer node #1129

Closed oleksandrSydorenkoJ closed 1 day ago

oleksandrSydorenkoJ commented 6 days ago

Describe the bug Sync admin should recreate skaled container if it doesn't exist (removed)

Preconditions Active schain medium type Connected indexer node (100 GB )

Versions:

To Reproduce

  1. Restart Sync admin on indexer node
  2. Stop and remove the skaled container on the indexer node
  3. Check logs in sync-admin0

Expected behavior Sync admin should recreate removed skaled-container

Actual state: Sync admin tries to connect to the removed skaled through RPC and fails to finish the monitor

[2024-11-05 19:17:53,895 INFO][7][massive-long-sirius][T_1] - core.schains.firewall.utils:42 - Creating rule controller for massive-long-sirius
[2024-11-05 19:17:53,897 ERROR][7][massive-long-sirius][T_1] - tools.helper:55 - Post request failed with: HTTPConnectionPool(host='127.0.0.1', port=10195): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f7c79adf2d0>: Failed to establish a new connection: [Errno 111] Connection refused'))
[2024-11-05 19:17:53,898 WARNING][7][massive-long-sirius][T_1] - core.schains.rpc:66 - Empty response from skaled
[2024-11-05 19:17:53,899 ERROR][7][massive-long-sirius][T_1] - tools.helper:55 - Post request failed with: HTTPConnectionPool(host='127.0.0.1', port=10195): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f7c79a060d0>: Failed to establish a new connection: [Errno 111] Connection refused'))
[2024-11-05 19:17:53,899 WARNING][7][massive-long-sirius][T_1] - core.schains.rpc:66 - Empty response from skaled
[2024-11-05 19:17:54,050 INFO][7][massive-long-sirius][T_1] - core.schains.firewall.rule_controller:192 - Rules status: missing rules 0, redundant rules: 0

~/.skale/node_data/schains/massive-long-sirius# docker ps -a
CONTAINER ID   IMAGE                             COMMAND                  CREATED       STATUS                                 PORTS     NAMES
9ca275abfdc4   nginx:1.20.2                      "/docker-entrypoint.…"   2 hours ago   Up 2 hours                                       skale_nginx
0622cb6b0505   rancher/pause:3.6                 "/pause"                 2 hours ago   Up 2 hours                                       config_base_1
8f3f4cb71630   skalenetwork/admin:2.8.0-beta.2   "python3 sync_node.py"   2 hours ago   Up About a minute (health: starting)             skale_sync_admin
badrogger commented 4 days ago

It looks like that the main problem here is thaterror logs are not displayed in some cases.

oleksandrSydorenkoJ commented 13 hours ago

Verified on Legacy network skalenetwork/admin:2.8.0

skaled_restarts_3.20.0+admin_2.8.0.txt