Improve snapshot sending/receiving procedure

skalenetwork / skale-admin

SKALE admin docker container orchestrates all other SKALE Docker containers

https://skale.network

GNU Affero General Public License v3.0

18 stars 4 forks source link

Improve snapshot sending/receiving procedure #467

Closed sync-by-unito[bot] closed 2 years ago

sync-by-unito[bot] commented 3 years ago

The following approach is suggested:

For now leave Large schain type completely out of the scope, because there some other issues that prevents us to create it.
Release first mainnet version without related feature.
For the second mainnet update implement one of the following solutions (going to decide later):

a. Saving data to the temporary space (reserved space inside attached storage) without limiting number of schains that currently downloading snapshots (modifications in both skale node components and skaled).

b. Send and receive snapshots using streams without saving snapshots to any non btrfs file/directory (require only skaled changes).

sync-by-unito[bot] commented 3 years ago

➤ Ganna Kulikova commented:

To test out current hardware requirements

sync-by-unito[bot] commented 3 years ago

➤ Ganna Kulikova commented:

Ivan Popovych please document the options and proposals on Confluence

sync-by-unito[bot] commented 3 years ago

➤ Ganna Kulikova commented:

Go forward with scratch space solution and allocate disk storage to it

sync-by-unito[bot] commented 3 years ago

➤ Automation for Jira commented:

Corresponding Pull Request https://github.com/skalenetwork/skaled/pull/644

sync-by-unito[bot] commented 3 years ago

➤ Alex Danko commented:

wait for skale_admin with shared space for testing this feature Dmitry Tkachuk ping when it will be deployed

sync-by-unito[bot] commented 3 years ago

➤ Ganna Kulikova commented:

Unblocking

sync-by-unito[bot] commented 3 years ago

➤ Automation for Jira commented:

Corresponding Pull Request https://github.com/skalenetwork/skaled/pull/672

sync-by-unito[bot] commented 3 years ago

➤ Dima Litvinov commented:

Added 4 logs after implemebtation: 1 nodes 4 times tries to download from another 3 nodes (seems to work correct)

sync-by-unito[bot] commented 3 years ago

➤ Oleksandr Sydorenko commented:

Still actual for schain:3.7.4-develop.0

Steps to reproduce:

create 4 schains on 4 nodes
turn-off node A
run filestorage-tests for 3 snapshot creation
turn-on node A
3 of 4 skled should start download snapshot, last skaled should wait until some of node will be able to upload snapshot for 4-th schain

actual state: “CRITICAL FATAL: tried to download snapshot from everywhere!“ appears when node trying to download snapshot

[^snapshot_occeupied_melodic-yildun.log] [^snapshot_occupied_tinkling-zibal.log]

sync-by-unito[bot] commented 3 years ago

➤ Ganna Kulikova commented:

Closing per discussion with Stan Kladko and Dima Litvinov

Issues are covered by other Jira tickets (see related)