scylladb / seastar

High performance server-side application framework
http://seastar.io
Apache License 2.0
8.26k stars 1.54k forks source link

coroutine generator use-after-free #1677

Open xemul opened 1 year ago

xemul commented 1 year ago

Consider a code (can be added as a test case to coroutines_test.cc):

struct wrapped_string {
    sstring s;
};

coroutine::experimental::generator<wrapped_string, buffered_container> generate_wrapped_strings(coroutine::experimental::buffer_size_t size, int nr) {
    for (int i = 0; i < nr; i++) {
        sstring x = format("abcdefghijklmnopqrstuvwxyz-{}", i);
        co_yield wrapped_string(std::move(x));
    }
}

SEASTAR_TEST_CASE(test_generator_lref) {
    auto sgen = generate_wrapped_strings(coroutine::experimental::buffer_size_t{8}, 42);
    while (auto ws = co_await sgen()) {
        fmt::print("{}\n", ws->s);
    }
}

First, it's important that sstring x is that long -- it must be external.

The bug -- if run in debug mode it crashes with heap use-after-free message:

It seems that ws->s should be eventually copy-constructed or move-constructed from x, but in fact ws and x share the pointer to the external buffer for some reason.

There are two options how to work around that bug.

  1. Declare constructor for wrapped_string
struct wrapped_string {
    sstring s;
    wrapped_string(sstring s_) : s(std::move(s_)) {}
};
  1. Make wrapped_string local variable in generator
    for (int i = 0; i < nr; i++) {
        sstring x = format("abcdefghijklmnopqrstuvwxyz-{}", i);
        auto ws = wrapped_string(std::move(x));
        co_yield ws;
    }

cc @avikivity @tchaikov

avikivity commented 1 year ago

Looks like a miscompile, no? with what clang do you see it?

xemul commented 1 year ago

Looks like a miscompile, no?

Yes. ws and x have different this values, but strings inside point to the same malloc-ed buffer. It all can be explained by "compiler memcpy-ed x into ws instead of copy/move constructing it"

with what clang do you see it?

$ clang++ --version
clang version 15.0.7 (Fedora 15.0.7-2.fc37)
avikivity commented 1 year ago

Well I see a similar bug, but with clang 16 (and 15 is immune for my test case).

Try 16, maybe the fix that broke my test case fixes yours.

tchaikov commented 1 year ago

yeah, i am using clang-17, this test survived with the following steps:

$ cmake -DCMAKE_CXX_COMPILER=clang++ -GNinja -Bbuild/debug -DCMAKE_BUILD_TYPE=Debug
$ ninja -C build/debug test_unit_coroutines
$ build/debug/tests/unit/coroutines_test
...
random-seed=2940931593                
==1796162==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
abcdefghijklmnopqrstuvwxyz-0           
abcdefghijklmnopqrstuvwxyz-1              
abcdefghijklmnopqrstuvwxyz-2   
abcdefghijklmnopqrstuvwxyz-3                              
abcdefghijklmnopqrstuvwxyz-4                
abcdefghijklmnopqrstuvwxyz-5                            
abcdefghijklmnopqrstuvwxyz-6              
...
abcdefghijklmnopqrstuvwxyz-40
abcdefghijklmnopqrstuvwxyz-41
Reactor stalled for 34 ms on shard 0. Backtrace: 0x50edc8 0xdd4d0d 0xdd41d5 0xc091b0 0xc03bfd 0xc0367a 0xc03f77 0xc0a653 0x3db6f 0x15ba7f 0x4cd737 0x4cb1d7 0x565812 0x7aca5b 0x97ec6e 0x97e791 0xc2b0f9 0xc3340e 0xc373ff 0xc3507c 0xa08d78 0
xa06246 0x99ed61 0x99e663 0x99e553 0x99df92 0xa67989 0xb8fcbb 0x5619ee 0x8c906 0x11286f
Reactor stalled for 93 ms on shard 0. Backtrace: 0x50edc8 0xdd4d0d 0xdd41d5 0xc091b0 0xc03bfd 0xc0367a 0xc03f77 0xc0a653 0x3db6f 0x57f1a4 0x580fd6 0x4d1834 0x4d16bb 0x4d12e8 0x4d0f84 0x4cd2f7 0x4cb1d7 0x565812 0x7aca5b 0x97ec6e 0x97e791 0
xc2b0f9 0xc3340e 0xc373ff 0xc3507c 0xa08d78 0xa06246 0x99ed61 0x99e663 0x99e553 0x99df92 0xa67989 0xb8fcbb 0x5619ee 0x8c906 0x11286f

*** No errors detected