threefoldtech / 0-stor_v2

Apache License 2.0
3 stars 1 forks source link

0-stor_v2

zstor is an object encoding storage system. It can be run in either a daemon - client setup, or it can perform single actions without an associated daemon, which is mainly useful for uploading/retrieving single items. The daemon is part of the same binary, and will run other useful features, such as a repair queue which periodically verifies the integrity of objects.

Storage and data integrity

Zstor uses 0-db's to store the data. It does so by splitting up the data in chunks and distributing them over N 0-db's.

C4Component
title Zstor setup

Component(zstor, "Zstor instance")

Deployment_Node(zerodbgroup0,"0-db group", ""){
    System(zerodb1,"0-db 1")
    System(zerodb2,"0-db 2")
}
Deployment_Node(zerodbgroup1,"0-db group", ""){
    System(zerodbx,"0-db ...") 
    System(zerodbn,"0-db N") 
}

Rel(zstor, zerodb1, "")
Rel(zstor, zerodb2, "")
Rel(zstor, zerodbx, "")
Rel(zstor, zerodbn, "")

Zstor uses forward looking error correcting codes (FLECC) for data consistency and to protect against data loss.

This means zstor constantly tries to spread the data over N, being the expected_shards, 0-db's.

As long as there are M (minimal_shards), M being smaller than N off course, chunks of data intact, zstor can recover the data.

Expected setup

Currently, zstor expects a stable system to start from, which is user provided:

Daemon - client usage vs standalone usage

The daemon, or monitor, can be started by invoking zstor with the monitor subcommand. This starts a long running process, and opens up a unix socket on the path specified in the config. Regular command invocations (example "store") of zstor will then read the path to the unix socket from the config, connect to it, send the command, and wait until the monitor daemon returns a response after executing the command. This setup is recommended as:

If the socket path is not specified, zstor will fall back to its single command flow, where it executes the command in process, and then exits. Invoking zstor multiple times in quick succession might cause multiple uploads to be performed at the same time, causing multiple cpu cores to be used for the encryption/compression.

Current features

Supported commands

Other features

Building

Make sure you have the latest Rust stable installed. Clone the repository:

git clone https://github.com/threefoldtech/0-stor_v2
cd 0-stor_v2

Then build with the standard toolchain through cargo:

cargo build

This will produce the executable in ./target/debug/zstor_v2.

Static binary

On linux, a fully static binary can be compiled by using the x86_64-unknown-linux-musl target, as follows:

cargo build --target x86_64-unknown-linux-musl --release

Config file

Running zstor requires a config file. An example config, and explanation of the parameters is found below.

Example config file

minimal_shards = 10
expected_shards = 15
redundant_groups = 1
redundant_nodes = 1
root = "/virtualroot"
socket = "/tmp/zstor.sock"
prometheus_port = 9100
zdb_data_dir_path = "/tmp/0-db/data"
max_zdb_data_dir_size = 25600

[encryption]
algorithm = "AES"
key = "0000000000000000000000000000000000000000000000000000000000000000"

[compression]
algorithm = "snappy"

[meta]
type = "zdb"

[meta.config]
prefix = "someprefix"

[meta.config.encryption]
algorithm = "AES"
key = "0101010101010101010101010101010101010101010101010101010101010101"

[[meta.config.backends]]
address = "[2a02:1802:5e::dead:beef]:9900"
namespace = "test2"
password = "supersecretpass"

[[meta.config.backends]]
address = "[2a02:1802:5e::dead:beef]:9901"
namespace = "test2"
password = "supersecretpass"

[[meta.config.backends]]
address = "[2a02:1802:5e::dead:beef]:9902"
namespace = "test2"
password = "supersecretpass"

[[meta.config.backends]]
address = "[2a02:1802:5e::dead:beef]:9903"
namespace = "test2"
password = "supersecretpass"

[[groups]]
[[groups.backends]]
address = "[fe80::1]:9900"

[[groups.backends]]
address = "[fe80::1]:9900"
namespace = "test"

[[groups]]
[[groups.backends]]
address = "[2a02:1802:5e::dead:babe]:9900"

[[groups.backends]]
address = "[2a02:1802:5e::dead:beef]:9900"
namespace = "test2"
password = "supersecretpass"

Config file explanation

Explanation:

Metadata

When data is encoded, metadata is generated to later retrieve this data. The metadata is stored in 4 0-dbs, with a given prefix.

For every file, we get the full path of the file on the system, generate a 16 byte blake2b hash, and hex encode the bytes. We then append this to the prefix to generate the final key.

The key structure is: /{prefix}/meta/{hashed_path_hex}

The metadata itself is encrypted, binary encoded, and then dispersed in the metadata 0-dbs.

Metadata cluster requirements

Since the metadata is also encoded before being stored, we need to know the used encoding to be able to decode again. Since we can't store metadata about metadata itself, this is a static setup. As said, at present there are 4 metadata storage 0-db's defined. Since the key is defined by the system, these must be run in user mode. At the moment, it is not possible to define more metadata stores as can be done with regular data stores.

The actual metadata is encoded in a 2:2 setup, that is, 2 data shards and 2 parity shards. This allows up to 2 (i.e. half) of the metadata stores to be lost, while still retaining access to the data. Any 2 stores can be used to recover the data, there is no specific difference between them.

Because the system is designed to prioritize recoverability over availability, writers will be rejected if the metadata storage is in the degraded state, that is, not all 4 stores are available and writeable.

A metadata store can be replaced by a new one, by removing the old one in the config and inserting the new one. The repair subsystem will take care of rebulding the data, regenerating the shards, and storing the new shards on the new metatada store.