[seastar-sharding] Shard data among cores

ndragazis / tinykv

Simple key value store based on Seastar

Apache License 2.0

0 stars 0 forks source link

[seastar-sharding] Shard data among cores #6

Open ndragazis opened 2 days ago

ndragazis commented 2 days ago

I have completed the transition from synchronous to asynchronous code with Seastar (see #5). Opening this issue to explore how we can transition to a sharded architecture, whereby each core will be exclusively managing a key range.

Sharded HTTP server

The Seastar framework seemlessly supports running a sharded HTTP server via the http_server_control class. This creates a distributed http_server object underneath.

ndragazis commented 2 days ago

Sharded KVStore

Making the KVStore service sharded presents two challenges:

The constructor and destructor are not called from inside a seastar::thread, so future::get() does not block. This means that KVStore::{start, stop} can no longer be called from the ctor/dtor and they must be called separately. Essentially, I have to revert https://github.com/ndragazis/tinykv/commit/8e4bb566a6ce7a5e646637186a059883e3844b61.
The seastar::sharded class that we use to make a service sharded, automatically detects and calls the service's stop() function when we call seastar::sharded::stop(), if it exists. So, we either have to make KVStore::stop() idempotent (should be a no-op when called more than once), or do not call it from the application code explicitly.

ndragazis commented 18 hours ago

Data Sharding

There are two common approaches to partition the data:

Partition by key range
Partition by hash range

Partition by key range is preferred when one wants to support range queries, but it's hard to partition the data uniformly. Partition by hash range provides better data distribution.

I will go with hash range partitioning for the following reasons:

It's simpler.
Supporting range queries is not in my plans.
Querying the sorted list of all keys, which will become part of the API (see #2) can be implemented with both key range and hash range partitioning, though the latter requires more work.

There is an example with hash range partitioning in Seastar's codebase in memcached.cc.

ndragazis commented 16 hours ago

File Naming Strategy

In the current single-sharded design, the file structure is the following:

.tinykv/
├── sstable_1
├── sstable_2
├── wal
└── wal_1

To make the kv store multi-sharded, each shard should have its own WALs and SSTables. The new file structure will look like this:

.tinykv
├── shard_0
│   ├── sstable_1
│   ├── sstable_2
│   ├── sstable_3
│   └── wal
├── shard_1
│   └── wal
├── shard_2
│   ├── sstable_1
│   └── wal
├── shard_3
│   └── wal
├── shard_4
│   └── wal
├── shard_5
│   ├── sstable_1
│   └── wal
├── shard_6
│   └── wal
└── shard_7
    └── wal