ndragazis / tinykv

Simple key value store based on Seastar
Apache License 2.0
0 stars 0 forks source link

[seastar] Use the Seastar framework for async IO #5

Closed ndragazis closed 3 days ago

ndragazis commented 1 week ago

So far I have a very simple first iteration that consists of memtables, WALs and SSTables. All code is synchronous.

Starting this issue to explore the Seastar API to switch into an asynchronous and sharded architecture.

Related links: https://github.com/scylladb/seastar/blob/master/doc/tutorial.md http://docs.seastar.io/master/index.html https://github.com/scylladb/seastar/tree/master/apps

ndragazis commented 1 week ago

It is generally recommended to use coroutines instead of futures/promises/continuations. They provide a synchronous programming model which is easier to grasp. They are also a better choice than Seastar threads which do not scale due to their memory requirements (see also https://github.com/ndragazis/tinykv/issues/3#issuecomment-2179302761). Another great property is that you don't have to worry about capture-by-reference problems (see https://github.com/ndragazis/tinykv/issues/5#issuecomment-2186889676).

In this project, we need to implement an HTTP server. The httpd app in the Seastar repo is starting the app from a seastar::thread. However, it is possible to use futures or coroutines as well (see tls_echo_server_demo.cc and coroutines_demo.cc).

I will try to use coroutines everywhere and see what happens.

ndragazis commented 1 week ago

If I pass a coroutine to app.run(), it seems that I cannot use seastar::defer() inside the coroutine (just like in httpd). The program terminates with the following error message:

app: /opt/seastar/src/core/future.cc:271: void seastar::internal::future_base::do_wait(): Assertion `thread' failed.
Aborting on shard 0, in scheduling group main.

It looks like seastar::defer() can only be invoked from inside a seastar::thread.

An alternative is to use a try..catch block, as instructed in: https://docs.seastar.io/master/tutorial.html#exceptions-in-coroutines

An easy way to test this is to set the IP address to some random value. The error that we get is the following:

ERROR 2024-06-23 15:32:48,085 [shard 0:main] seastar - Exiting on unhandled exception: std::system_error (error system:99, posix_listen failed for address 1.2.3.4:9999: Cannot assign requested address)

Note that seastar::future has other interfaces for handling exceptions, see: https://docs.seastar.io/master/tutorial.html#exceptions-vs.-exceptional-futures

ndragazis commented 1 week ago

Logger

Seastar provides its own logging API. In scylla codebase, I see that they create a separate logger per source file.

An example on how to change the log level of some logger at startup:

./app --logger-log-level "sstable.cc=debug"
ndragazis commented 1 week ago

Asynchronous Constructor/Desctuctor

I have a class called KVStore that represents the storage engine. The constructor restores memtables from WALs and discovers all SSTables. Those are IO operations, so they need to run asynchronously. But the constructor cannot be a promise or coroutine. How do we do we make this Seastar-friendly?

It looks like RAII is broken in Seastar. Constructors and destructors have to be synchronous. For example, the password_authenticator class in Scylla has start() and stop() methods, invoked separately from the ctors/dtors.

Note though that I see some weird seastar::engine().at_exit() calls in ScyllaDB codebase to register cleanup functions for objects. I am not sure if this is a good practice.


UPDATE: It is actually possible to invoke async code from constructors and destructors. We just have to invoke them from inside a seastar::thread and use .get() to block on the async code (see [1] [2]).

In practice it is not so easy though. Through experimentation, I discovered that we cannot call future::get() from inside continuations or coroutines that are called inside a thread (again with future::get()). Some examples that break:

#include <seastar/core/app-template.hh>
#include <seastar/util/log.hh>
#include <seastar/core/thread.hh>
#include <seastar/core/sleep.hh>
#include <seastar/core/coroutine.hh>

static seastar::logger lg(__FILE__);

class Entity {
public:
    std::string name;
    Entity(const std::string& name)
        : name(name) {
        start().get();
    }
    ~Entity() {
        stop().get();
    }
    seastar::future<> start() {
        lg.info("Starting Entity: {}", name);
        return seastar::sleep(std::chrono::seconds(1));
    }
    seastar::future<> stop() {
        lg.info("Stopping Entity: {}", name);
        return seastar::sleep(std::chrono::seconds(1));
    }
};
seastar::future<> coro() {
    lg.info("Running coroutine `coro'...");
    return seastar::sleep(std::chrono::seconds(1));
}
seastar::future<> coro2() {
    lg.info("Running coroutine `coro2'...");
    Entity entity("Nikos");
    co_await seastar::sleep(std::chrono::seconds(1));
}
seastar::future<> coro3(std::unique_ptr<Entity> entity) {
    lg.info("Running coroutine `coro3'...");
    lg.info("Entity name: {}", entity->name);
    co_await seastar::sleep(std::chrono::seconds(1));
    lg.info("Exiting coroutine `coro3'.");
}

int main(int argc, char** argv) {
    seastar::app_template app;
    return app.run(argc, argv, [] {
        return seastar::async([] {
            /*
             * Example 1 (fail)
             * Ctors/Dtors cannot call .get() if they are called from inside a continuation.
             */
            //coro().then([] {
            //    Entity entity("Nikos");
            //}).get();
            /*
             * Example 2 (fail)
             * Ctos/Dtors cannot be called from inside a coroutine.
             */
             //coro2().get();
            /*
             * Example 3 (fail)
             * Release a unique pointer from inside a coroutine.
             * Destructor cannot be called from a coroutine.
             */
             //std::unique_ptr<Entity> entity = std::make_unique<Entity>("Nikos");
             //coro3(std::move(entity)).get();
        });
    });
}

The error message in all cases is the following:

app: /opt/seastar/src/core/future.cc:271: void seastar::internal::future_base::do_wait(): Assertion `thread' failed.

We could make ctors work even in these cases by nesting the allocations inside seastar threads. But we cannot do that for dtors, since we do not have control on how they are called.

So, I think the best solution is to have distinct async ctors/dtors (e.g., start/stop) that we call explicitly.

[1] https://stackoverflow.com/questions/72135948/is-seastarthread-a-stackful-coroutine [2] https://stackoverflow.com/questions/72370073/how-is-seastarthread-better-than-a-stackless-coroutine

ndragazis commented 1 week ago

Asynchronous HTTP request handlers

As explained in #2, we need to implement some REST APIs for the key-value store. Currently, I have implemented the APIs by using the synchronous Seastar HTTP API, that is the httpd::function_handler with the function handlers being httpd::request_function and httpd::handle_function. Copying from the header file:

/**
 * A request function is a lambda expression that gets only the request
 * as its parameter
 */
typedef std::function<sstring(const_req req)> request_function;

/**
 * A handle function is a lambda expression that gets request and reply
 */
typedef std::function<sstring(const_req req, http::reply&)> handle_function;

However, the httpd::function_handler supports asynchronous handlers as well:

/**
 * A future_json_function is a function that returns a future json reponse.
 * Similiar to the json_request_function, using the json reponse is done
 * implicitly.
 */
typedef std::function<
        future<json::json_return_type>(std::unique_ptr<http::request> req)> future_json_function;

typedef std::function<
        future<std::unique_ptr<http::reply>>(std::unique_ptr<http::request> req,
                std::unique_ptr<http::reply> rep)> future_handler_function;
ndragazis commented 6 days ago

File IO

Seastar has its own File IO API: https://docs.seastar.io/master/group__fileio-module.html

There is a low-level API In <seastar/core/file.hh>. This seems low-level (e.g., it requires alignment). There is another API with stream semantics in <seastar/core/{iostream, fstream}.h>. Note that streams are buffered. On top of these, there are some utilities in <seastar/util/{file, short_streams, read_first_line, }.hh. Some examples can be found in file_demo.cc, file_io_test.cc, fstream_test.cc.

Some of these APIs:

Unlike the C++ standard library, Seastar does not support opening a file in append mode. This would maybe be useful for the write-ahead log.

ndragazis commented 6 days ago

Capturing state / Passing ownership in Continuations

https://docs.seastar.io/master/tutorial.html#capturing-state-in-continuations https://docs.seastar.io/master/tutorial.html#passing-ownership-to-continuation

Copying from the docs: Whenever an asynchronous function takes a parameter by reference, the caller must ensure that the referred object lives until the future returned by the function is resolved.

I think I bumped into this while making SSTable::get() asynchronous. In the function handler for GET requests, I was obtaining the key from the URL, storing it in a local variable, and passing it down to KVStore::get() and SSTable::get() by reference. After adding some log messages in SSTable::get(), I observed that, when searching in first SSTable it did have a value, but when searching in the rest of the SSTables it was empty. The problem disappeared when I wrapped the SVStore::get() invocation with seastar::do_with().

The docs provide two solutions to this problem:

ndragazis commented 3 days ago

Write-Ahead Log implementation with Seastar

Resuming from https://github.com/ndragazis/tinykv/issues/5#issuecomment-2186864598, we need to implement a write-ahead log for memtables.

Requirements:

So, we need to use the low-level Seastar File IO API. This API uses O_DIRECT underneath, which imposes limitations regarding buffer size, buffer alignment, file offset alignment. Given that we need to append records of arbitrary sizes (depends on the key-values entered by the user), we need to implement some buffering logic inside the app, so that we always write properly sized and aligned data chunks on disk.