scylladb / seastar

High performance server-side application framework
http://seastar.io
Apache License 2.0
8.3k stars 1.54k forks source link

the "Generator fallback test" in directory_test is flaky due to `!_wait_for_free_space` assertion failure #1913

Open tchaikov opened 11 months ago

tchaikov commented 11 months ago

it fails due to one of following failures:

co_yield gets called again even if the previous returned yield_awaiter is not ready, and that not-ready awaiter has not been resumed:

directory_test: /home/kefu/dev/seastar/include/seastar/coroutine/generator.hh:186: yield_awaiter<T, Container> seastar::coroutine::experimental::internal::generator_buffered_promise<seastar::directory_entry, seastar::dir_entry_buffer>::yield_value(U &&) [T = seastar::directory_entry, Container = seastar::dir_entry_buffer, U = seastar::directory_entry &]: Assertion `!_wait_for_free_space' failed.

a task / promise is performed by the reactor even after it is deleted, when the coroutine presented by it returns. this can be reproduced with -c 1:

==2432222==ERROR: AddressSanitizer: heap-use-after-free on address 0x617000000e90 at pc 0x5584c4fb79b5 bp 0x7ffc9418e070 sp 0x7ffc9418e068                                                                                                    
READ of size 8 at 0x617000000e90 thread T0                                                                                                                                                                                                    
    #0 0x5584c4fb79b4 in seastar::reactor::run_tasks(seastar::reactor::task_queue&) /home/kefu/dev/seastar/src/core/reactor.cc:2661:14                                                                                                        
    #1 0x5584c4fbfb3e in seastar::reactor::run_some_tasks() /home/kefu/dev/seastar/src/core/reactor.cc:3124:9
    #2 0x5584c4fc3a1d in seastar::reactor::do_run() /home/kefu/dev/seastar/src/core/reactor.cc:3293:9
    #3 0x5584c4fc16f3 in seastar::reactor::run() /home/kefu/dev/seastar/src/core/reactor.cc:3176:16
    #4 0x5584c4d6fb2a in seastar::app_template::run_deprecated(int, char**, std::function<void ()>&&) /home/kefu/dev/seastar/src/core/app-template.cc:276:31
    #5 0x5584c4d6d0ca in seastar::app_template::run(int, char**, std::function<seastar::future<int> ()>&&) /home/kefu/dev/seastar/src/core/app-template.cc:167:12
    #6 0x5584c4d701ba in seastar::app_template::run(int, char**, std::function<seastar::future<void> ()>&&) /home/kefu/dev/seastar/src/core/app-template.cc:182:12
    #7 0x5584c4b496d7 in main /home/kefu/dev/seastar/tests/unit/directory_test.cc:145:27
    #8 0x7f6192a49b89 in __libc_start_call_main (/lib64/libc.so.6+0x27b89) (BuildId: 7026fe8c129a523e07856d7c96306663ceab6e24)
    #9 0x7f6192a49c4a in __libc_start_main@GLIBC_2.2.5 (/lib64/libc.so.6+0x27c4a) (BuildId: 7026fe8c129a523e07856d7c96306663ceab6e24)
    #10 0x5584c4a6d984 in _start (/home/kefu/dev/seastar/build/debug/tests/unit/directory_test+0x1156984)

0x617000000e90 is located 16 bytes inside of 672-byte region [0x617000000e80,0x617000001120)
freed by thread T0 here:                                                                                               
    #0 0x5584c4b429cd in operator delete(void*) /home/kefu/dev/llvm-project/compiler-rt/lib/asan/asan_new_delete.cpp:152:3
    #1 0x5584c4c13767 in seastar::make_list_directory_fallback_generator(seastar::coroutine::experimental::buffer_size_t, seastar::file_impl&) (.resume) /home/kefu/dev/seastar/src/core/file.cc:1374:78
    #2 0x5584c4c39e1b in std::__n4861::coroutine_handle<seastar::coroutine::experimental::internal::generator_buffered_promise<seastar::directory_entry, seastar::dir_entry_buffer>>::resume() const /usr/lib/gcc/x86_64-redhat-linux/13/../..
/../../include/c++/13/coroutine:240:29
    #3 0x5584c4c39925 in seastar::coroutine::experimental::internal::generator_buffered_promise<seastar::directory_entry, seastar::dir_entry_buffer>::run_and_dispose() /home/kefu/dev/seastar/include/seastar/coroutine/generator.hh:226:42
    #4 0x5584c4fb79e8 in seastar::reactor::run_tasks(seastar::reactor::task_queue&) /home/kefu/dev/seastar/src/core/reactor.cc:2661:14
    #5 0x5584c4fbfb3e in seastar::reactor::run_some_tasks() /home/kefu/dev/seastar/src/core/reactor.cc:3124:9
    #6 0x5584c4fc3a1d in seastar::reactor::do_run() /home/kefu/dev/seastar/src/core/reactor.cc:3293:9
    #7 0x5584c4fc16f3 in seastar::reactor::run() /home/kefu/dev/seastar/src/core/reactor.cc:3176:16
    #8 0x5584c4d6fb2a in seastar::app_template::run_deprecated(int, char**, std::function<void ()>&&) /home/kefu/dev/seastar/src/core/app-template.cc:276:31
    #9 0x5584c4d6d0ca in seastar::app_template::run(int, char**, std::function<seastar::future<int> ()>&&) /home/kefu/dev/seastar/src/core/app-template.cc:167:12
    #10 0x5584c4d701ba in seastar::app_template::run(int, char**, std::function<seastar::future<void> ()>&&) /home/kefu/dev/seastar/src/core/app-template.cc:182:12
    #11 0x5584c4b496d7 in main /home/kefu/dev/seastar/tests/unit/directory_test.cc:145:27
    #12 0x7f6192a49b89 in __libc_start_call_main (/lib64/libc.so.6+0x27b89) (BuildId: 7026fe8c129a523e07856d7c96306663ceab6e24)
    #13 0x7f6192a49c4a in __libc_start_main@GLIBC_2.2.5 (/lib64/libc.so.6+0x27c4a) (BuildId: 7026fe8c129a523e07856d7c96306663ceab6e24)
    #14 0x5584c4a6d984 in _start (/home/kefu/dev/seastar/build/debug/tests/unit/directory_test+0x1156984)

previously allocated by thread T0 here:
    #0 0x5584c4b4216d in operator new(unsigned long) /home/kefu/dev/llvm-project/compiler-rt/lib/asan/asan_new_delete.cpp:95:3
    #1 0x5584c4bcfb9c in seastar::make_list_directory_fallback_generator(seastar::coroutine::experimental::buffer_size_t, seastar::file_impl&) /home/kefu/dev/seastar/src/core/file.cc:1374:78
    #2 0x5584c4bcf8d0 in seastar::file_impl::experimental_list_directory() /home/kefu/dev/seastar/src/core/file.cc:1397:12
    #3 0x5584c4bcc591 in seastar::file::experimental_list_directory() /home/kefu/dev/seastar/src/core/file.cc:1166:24
    #4 0x5584c4b45a7d in lister_generator_test(seastar::file) /home/kefu/dev/seastar/tests/unit/directory_test.cc:91:21 
    #5 0x5584c4b5a21e in lister_generator_test() (.resume) /home/kefu/dev/seastar/tests/unit/directory_test.cc:135:14
    #6 0x5584c4b6645b in std::__n4861::coroutine_handle<seastar::internal::coroutine_traits_base<void>::promise_type>::resume() const /usr/lib/gcc/x86_64-redhat-linux/13/../../../../include/c++/13/coroutine:240:29
    #7 0x5584c4b65fe5 in seastar::internal::coroutine_traits_base<void>::promise_type::run_and_dispose() /home/kefu/dev/seastar/include/seastar/core/coroutine.hh:125:20
    #8 0x5584c4fb79e8 in seastar::reactor::run_tasks(seastar::reactor::task_queue&) /home/kefu/dev/seastar/src/core/reactor.cc:2661:14
    #9 0x5584c4fbfb3e in seastar::reactor::run_some_tasks() /home/kefu/dev/seastar/src/core/reactor.cc:3124:9
    #10 0x5584c4fc3a1d in seastar::reactor::do_run() /home/kefu/dev/seastar/src/core/reactor.cc:3293:9
    #11 0x5584c4fc16f3 in seastar::reactor::run() /home/kefu/dev/seastar/src/core/reactor.cc:3176:16
    #12 0x5584c4d6fb2a in seastar::app_template::run_deprecated(int, char**, std::function<void ()>&&) /home/kefu/dev/seastar/src/core/app-template.cc:276:31
    #13 0x5584c4d6d0ca in seastar::app_template::run(int, char**, std::function<seastar::future<int> ()>&&) /home/kefu/dev/seastar/src/core/app-template.cc:167:12
    #14 0x5584c4d701ba in seastar::app_template::run(int, char**, std::function<seastar::future<void> ()>&&) /home/kefu/dev/seastar/src/core/app-template.cc:182:12
    #15 0x5584c4b496d7 in main /home/kefu/dev/seastar/tests/unit/directory_test.cc:145:27
    #16 0x7f6192a49b89 in __libc_start_call_main (/lib64/libc.so.6+0x27b89) (BuildId: 7026fe8c129a523e07856d7c96306663ceab6e24)
    #17 0x7f6192a49c4a in __libc_start_main@GLIBC_2.2.5 (/lib64/libc.so.6+0x27c4a) (BuildId: 7026fe8c129a523e07856d7c96306663ceab6e24)
    #18 0x5584c4a6d984 in _start (/home/kefu/dev/seastar/build/debug/tests/unit/directory_test+0x1156984)
tchaikov commented 11 months ago

these issues are not likely to be related to #1677.

xemul commented 11 months ago

Is it compiler-dependent? You ran it over Debian's clang-16, while on Ubuntu-16 and Fedora-16 it doesn't happen

xemul commented 11 months ago

Also note the make_list_directory_fallback_generator thing -- it means it didn't ran over true generator-based lister

xemul commented 11 months ago

Also note the make_list_directory_fallback_generator thing -- it means it didn't ran over true generator-based lister

Of course:

    #5 0x5584c4b5a21e in lister_generator_test() (.resume) /home/kefu/dev/seastar/tests/unit/directory_test.cc:135:14

That's specially crafted fallback test

tchaikov commented 11 months ago

Is it compiler-dependent? You ran it over Debian's clang-16, while on Ubuntu-16 and Fedora-16 it doesn't happen

how many times did you run? i managed to reproduce it with following script

export ASAN_OPTIONS=disable_coredump=0:abort_on_error=1:detect_stack_use_after_return=1:verify_asan_link_order=0
i=0
while build/gcc/tests/unit/directory_test ; do
  echo =====================$((i++))==================
done

it stops after 20 runs at most. it takes 5 runs usually.

i tested with GCC-13 and Clang-17 on Debian. i also tested with GCC-13 and Clang-16 on f38 .

distro compiler build mode failing test smp [^1]
debian sid clang 17 [^2] debug fallback gen 1 and 16
debian sid gcc 13 debug fallback gen 1 and 16
fedora 38 clang 16.0.6 debug fallback gen 1 and 16
fedora 38 gcc 13 debug fallback gen 1 and 16

[^1]: my laptop's nproc output is 16. [^2]: built from git repo, and packaged by debian maintainer(s).

tchaikov commented 11 months ago

Also note the make_list_directory_fallback_generator thing -- it means it didn't ran over true generator-based lister

yes. the failing test is using seastar::queue. not sure what you mean by "ran over". but yeah, i haven't found that the non-fallback one was failing. probably i should update the test description to explain the issue better.

xemul commented 11 months ago

how many times did you run?

In a loop for ~minute. Didn't count how many invocations it was though

xemul commented 11 months ago

Now (I believe I ran yum update this morning) it just doesn't even start:

Program received signal SIGSEGV, Segmentation fault.
0x00007fffee55689d in operator() (__closure=0x7fffffffd86f) at /home/xemul/src/seastar/src/core/exception_hacks.cc:78
78      static dl_iterate_fn org = [] {
(gdb) bt
#0  0x00007fffee55689d in operator() (__closure=0x7fffffffd86f) at /home/xemul/src/seastar/src/core/exception_hacks.cc:78
#1  0x00007fffee5569d1 in seastar::dl_iterate_phdr_org () at /home/xemul/src/seastar/src/core/exception_hacks.cc:82
#2  0x00007fffee5570bf in dl_iterate_phdr (callback=0x7ffff78d7a00 <__asan::FindFirstDSOCallback(dl_phdr_info*, size_t, void*)>, data=0x7fffffffd918) at /home/xemul/src/seastar/src/core/exception_hacks.cc:130
#3  0x00007ffff78d7e17 in __asan::AsanCheckDynamicRTPrereqs () at ../../../../libsanitizer/asan/asan_linux.cpp:171
#4  0x00007ffff78e4f39 in __asan::AsanInitInternal () at ../../../../libsanitizer/asan/asan_rtl.cpp:407
#5  0x00007ffff7fce35b in _dl_init (main_map=0x7ffff7ffe2c0, argc=1, argv=0x7fffffffd9c8, env=0x7fffffffd9d8) at dl-init.c:106
#6  0x00007ffff7fe47b0 in _dl_start_user () from /lib64/ld-linux-x86-64.so.2
#7  0x0000000000000001 in ?? ()
#8  0x00007fffffffdd86 in ?? ()
#9  0x0000000000000000 in ?? ()

upd: Ah, it now does need the ASAN_OPTIONS env

xemul commented 11 months ago

how many times did you run?

In a loop for ~minute. Didn't count how many invocations it was though

$ echo $ASAN_OPTIONS
disable_coredump=0:abort_on_error=1:detect_stack_use_after_return=1:verify_asan_link_order=0
$ i=0; while /bin/true; do echo $i; i=$((i+1)); ./build/debug/tests/unit/directory_test -c1 >/dev/null 2>&1; done
...
177
178
179
^C
$ i=0; while /bin/true; do echo $i; i=$((i+1)); ./build/debug/tests/unit/directory_test -c4 >/dev/null 2>&1; done
...
101
102
103
^C

(tests pass all the time)

tchaikov commented 11 months ago

how many times did you run?

In a loop for ~minute. Didn't count how many invocations it was though

$ echo $ASAN_OPTIONS
disable_coredump=0:abort_on_error=1:detect_stack_use_after_return=1:verify_asan_link_order=0
$ i=0; while /bin/true; do echo $i; i=$((i+1)); ./build/debug/tests/unit/directory_test -c1 >/dev/null 2>&1; done
...
177
178
179
^C
$ i=0; while /bin/true; do echo $i; i=$((i+1)); ./build/debug/tests/unit/directory_test -c4 >/dev/null 2>&1; done
...
101
102
103
^C

(tests pass all the time)

i am afraid that the script is not able to detect test failure, let me replace the test executable under test with false to explain it better. so the script would look like after replacing the test with false:

i=0
while true; do
  echo $i
  i=$((i+1))
  false # i am a test which fails all the time!
done

and i have

$ i=0; while /bin/true; do echo $i; i=$((i+1)); false; done
...
1744
1745
1746
1747
^C
xemul commented 11 months ago

i am afraid that the script is not able to detect test failure,

You're correct, but we're hunting segmentation fault, not test failure, aren't we?

xemul commented 11 months ago

Anyway, my point was that on my node+compiler it doesn't reproduce for hundreds of iteration :(

tchaikov commented 11 months ago

let me put the other way, if an executable segfaults, its status code would be 143. so your script cannot stop if any of the run segfaults.

xemul commented 11 months ago

let me put the other way, if an executable segfaults, its status code would be 143. so your script cannot stop if any of the run segfaults.

I ran it inside set -e script. The raw output was just to demonstate that it passes all the time.

Meanwhile, I got a very fast reproducer, finally.

  1. #if 0-out the legacy lister test as well as non-fallback generator one. Leave only the fallback generator
  2. Run the fallback generator test in a loop. Like this
  3. Drop directory entries' prints
--- a/tests/unit/directory_test.cc
+++ b/tests/unit/directory_test.cc
@@ -56,6 +56,7 @@ const char* de_type_desc(directory_entry_type t)
 }

 future<> lister_test() {
+#if 0
     class lister {
         file _f;
         subscription<directory_entry> _listing;
@@ -84,6 +85,8 @@ future<> lister_test() {
           return l.done();
        });
     });
+#endif
+    return make_ready_future<>();
 }

 #ifdef SEASTAR_COROUTINES_ENABLED
@@ -96,7 +99,6 @@ future<> lister_generator_test(file f) {
         } else {
             assert(sd.type == directory_entry_type::unknown);
         }
-        fmt::print("{} (type={})\n", de->name, de_type_desc(sd.type));
     }
     co_await f.close();
 }
@@ -124,15 +126,19 @@ class test_file_impl : public file_impl {
 };

 future<> lister_generator_test() {
+#if 0
     fmt::print("--- Generator lister test ---\n");
     auto f = co_await engine().open_directory(".");
     co_await lister_generator_test(std::move(f));
+#endif

-    fmt::print("--- Generator fallback test ---\n");
-    auto lf = co_await engine().open_directory(".");
-    auto tf = ::seastar::make_shared<test_file_impl>(std::move(lf));
-    auto f2 = file(std::move(tf));
-    co_await lister_generator_test(std::move(f2));
+    for (int i = 0; i < 1024; i++) {
+        fmt::print("--- Generator fallback test {} ---\n", i);
+        auto lf = co_await engine().open_directory(".");
+        auto tf = ::seastar::make_shared<test_file_impl>(std::move(lf));
+        auto f2 = file(std::move(tf));
+        co_await lister_generator_test(std::move(f2));
+    }
 }
 #else
 future<> lister_generator_test() {

It crashes like this on the 2nd loop iteration:

--- Generator fallback test 0 ---
--- Generator fallback test 1 ---
directory_test: /home/xemul/src/seastar/include/seastar/coroutine/generator.hh:186: seastar::coroutine::experimental::internal::yield_awaiter<T, Container> seastar::coroutine::experimental::internal::generator_buffered_promise<T, Container>::yield_value(U&&) [with U = seastar::directory_entry&; T = seastar::directory_entry; Container = seastar::dir_entry_buffer]: Assertion `!_wait_for_free_space' failed.
Aborting on shard 0.
xemul commented 11 months ago

Got another assertion in generator code from another test that looks related.

  1. Applied the patch
diff --git a/include/seastar/core/coroutine.hh b/include/seastar/core/coroutine.hh
index 2a011c16d8..87f244b547 100644
--- a/include/seastar/core/coroutine.hh
+++ b/include/seastar/core/coroutine.hh
@@ -180,7 +180,10 @@ struct awaiter<CheckPreempt, T> {
         }
     }

-    T await_resume() { return _future.get0(); }
+    T await_resume() {
+        assert(_future.available());
+        return _future.get0();
+    }
 };

 template<bool CheckPreempt>
diff --git a/tests/unit/coroutines_test.cc b/tests/unit/coroutines_test.cc
index b4ea0ed257..fbe822afac 100644
--- a/tests/unit/coroutines_test.cc
+++ b/tests/unit/coroutines_test.cc
@@ -755,6 +755,7 @@ coroutine::experimental::generator<int, Container>
 fibonacci_sequence(coroutine::experimental::buffer_size_t size, unsigned count) {
     auto a = 0, b = 1;
     for (unsigned i = 0; i < count; ++i) {
+        co_await seastar::yield();
         if (std::numeric_limits<decltype(a)>::max() - a < b) {
             throw std::out_of_range(
                 fmt::format("fibonacci[{}] is greater than the largest value of int", i));
  1. Run ./build/debug/tests/unit/coroutines_test --run_test=test_async_generator_drained_buffered -- -c1 and got
coroutines_test: /home/xemul/src/seastar/include/seastar/core/coroutine.hh:184: T seastar::internal::awaiter<CheckPreempt, T>::await_resume() [with bool CheckPreempt = true; T = void]: Assertion `_future.available()' failed.
Aborting on shard 0.
Backtrace:
  /lib64/libasan.so.8+0x69bb0
  /home/xemul/src/seastar/build/debug/libseastar.so+0x7cfb082
  /home/xemul/src/seastar/build/debug/libseastar.so+0x7cc9ae9
  /home/xemul/src/seastar/build/debug/libseastar.so+0x7b1810b
  /home/xemul/src/seastar/build/debug/libseastar.so+0x7b18338
  /home/xemul/src/seastar/build/debug/libseastar.so+0x7ba4de5
  /home/xemul/src/seastar/build/debug/libseastar.so+0x7bf549d
  /home/xemul/src/seastar/build/debug/libseastar.so+0x7bf5525
  /lib64/libc.so.6+0x3dbaf
  /lib64/libc.so.6+0x8e883
  /lib64/libc.so.6+0x3dafd
  /lib64/libc.so.6+0x2687e
  /lib64/libc.so.6+0x2679a
  /lib64/libc.so.6+0x36186
  0x5beab8
  0x56032a
  0x67f589
  0x67051b
  /home/xemul/src/seastar/build/debug/libseastar.so+0x7b7afa8
  /home/xemul/src/seastar/build/debug/libseastar.so+0x7b82445
  /home/xemul/src/seastar/build/debug/libseastar.so+0x7b88269
  /home/xemul/src/seastar/build/debug/libseastar.so+0x7b84044
  /home/xemul/src/seastar/build/debug/libseastar.so+0x76f6fbd
  /home/xemul/src/seastar/build/debug/libseastar.so+0x76f4772
  /home/xemul/src/seastar/build/debug/libseastar_testing.so+0x18ff81
  /home/xemul/src/seastar/build/debug/libseastar_testing.so+0x19b484
  /home/xemul/src/seastar/build/debug/libseastar_testing.so+0x199985
  /home/xemul/src/seastar/build/debug/libseastar_testing.so+0x1979ce
  /home/xemul/src/seastar/build/debug/libseastar.so+0x77342d0
  /home/xemul/src/seastar/build/debug/libseastar.so+0x794cdf5
  /lib64/libc.so.6+0x8c946
  /lib64/libc.so.6+0x11285f

the backtrace decoded


__assert_fail at ??:?
seastar::internal::awaiter<true, void>::await_resume() at /home/xemul/src/seastar/include/seastar/core/coroutine.hh:184 (discriminator 1)
fibonacci_sequence(fibonacci_sequence<buffered_container>(seastar::coroutine::experimental::buffer_size_t, unsigned int)::_Z18fibonacci_sequenceI18buffered_containerEN7seastar9coroutine12experimental9generatorIiT_EENS3_13buffer_size_tEj.Frame*) [clone .actor] at /home/xemul/src/seastar/tests/unit/coroutines_test.cc:758 (discriminator 1)
std::__n4861::coroutine_handle<seastar::coroutine::experimental::internal::generator_buffered_promise<int, buffered_container> >::resume() const at /usr/include/c++/13/coroutine:240
seastar::coroutine::experimental::internal::generator_buffered_promise<int, buffered_container>::run_and_dispose() at /home/xemul/src/seastar/include/seastar/coroutine/generator.hh:230 (discriminator 4)
seastar::reactor::run_tasks(seastar::reactor::task_queue&) at /home/xemul/src/seastar/src/core/reactor.cc:2661 (discriminator 1)
seastar::reactor::run_some_tasks() at /home/xemul/src/seastar/src/core/reactor.cc:3124 (discriminator 2)
seastar::reactor::do_run() at /home/xemul/src/seastar/src/core/reactor.cc:3293 (discriminator 1)
seastar::reactor::run() at /home/xemul/src/seastar/src/core/reactor.cc:3176 (discriminator 1)
seastar::app_template::run_deprecated(int, char**, std::function<void ()>&&) at /home/xemul/src/seastar/src/core/app-template.cc:276 (discriminator 2)
s```
xemul commented 11 months ago

And if running non-fallback generator test in a loop got this use-after-free in ~200th iteration

=================================================================
==21738==ERROR: AddressSanitizer: heap-use-after-free on address 0x61300001be90 at pc 0x7f1c9257af2a bp 0x7ffc5b174b50 sp 0x7ffc5b174b48
READ of size 8 at 0x61300001be90 thread T0
Reactor stalled for 35 ms on shard 0. Backtrace: 0x69bb0 0x7cfb385 0x7cc9e9f 0x7b18182 0x7b359cb 0x7b30263 0x7b30521 0x7b36d68 0x3dbaf 0x116cce 0x116d0e 0x116d0e 0x116d0e 0x1293f2 0x129ab9 0x775d3ef 0x129d31 0x116904 0x116a25 0x103409 0x1059fc 0xfe170 0xfe6b9 0x34ac0 0xe2fcf 0xe2562 0xe36eb 0x7b7af29 0x7b82445 0x7b88269 0x7b84044 0x76f6fbd 0x76f4772 0x76f4c6a 0x424e21 0x27b89 0x27c4a 0x41f634
Reactor stalled for 93 ms on shard 0. Backtrace: 0x69bb0 0x7cfb385 0x7cc9e9f 0x7b18182 0x7b359cb 0x7b30263 0x7b30521 0x7b36d68 0x3dbaf 0x122799 0x127208 0x128ebe 0x129ab9 0x775d3ef 0x129d31 0x116904 0x116a25 0x103409 0x1059fc 0xfe170 0xfe6b9 0x34ac0 0xe2fcf 0xe2562 0xe36eb 0x7b7af29 0x7b82445 0x7b88269 0x7b84044 0x76f6fbd 0x76f4772 0x76f4c6a 0x424e21 0x27b89 0x27c4a 0x41f634
Reactor stalled for 202 ms on shard 0. Backtrace: 0x69bb0 0x7cfb385 0x7cc9e9f 0x7b18182 0x7b359cb 0x7b30263 0x7b30521 0x7b36d68 0x3dbaf 0x1108cf 0x11357e 0x113a9b 0x113b67 0x113a9b 0x115188 0x11592e 0x103409 0x1059fc 0xfe170 0xfe6b9 0x34ac0 0xe2fcf 0xe2562 0xe36eb 0x7b7af29 0x7b82445 0x7b88269 0x7b84044 0x76f6fbd 0x76f4772 0x76f4c6a 0x424e21 0x27b89 0x27c4a 0x41f634
    #0 0x7f1c9257af29 in seastar::reactor::run_tasks(seastar::reactor::task_queue&) /home/xemul/src/seastar/src/core/reactor.cc:2661
    #1 0x7f1c92582445 in seastar::reactor::run_some_tasks() /home/xemul/src/seastar/src/core/reactor.cc:3124
    #2 0x7f1c92588269 in seastar::reactor::do_run() /home/xemul/src/seastar/src/core/reactor.cc:3293
    #3 0x7f1c92584044 in seastar::reactor::run() /home/xemul/src/seastar/src/core/reactor.cc:3176
    #4 0x7f1c920f6fbd in seastar::app_template::run_deprecated(int, char**, std::function<void ()>&&) /home/xemul/src/seastar/src/core/app-template.cc:276
    #5 0x7f1c920f4772 in seastar::app_template::run(int, char**, std::function<seastar::future<int> ()>&&) /home/xemul/src/seastar/src/core/app-template.cc:167
    #6 0x7f1c920f4c6a in seastar::app_template::run(int, char**, std::function<seastar::future<void> ()>&&) /home/xemul/src/seastar/src/core/app-template.cc:182
    #7 0x424e21 in main /home/xemul/src/seastar/tests/unit/directory_test.cc:151
    #8 0x7f1c89e49b89 in __libc_start_call_main (/lib64/libc.so.6+0x27b89) (BuildId: 7026fe8c129a523e07856d7c96306663ceab6e24)
    #9 0x7f1c89e49c4a in __libc_start_main_alias_2 (/lib64/libc.so.6+0x27c4a) (BuildId: 7026fe8c129a523e07856d7c96306663ceab6e24)
    #10 0x41f634 in _start (/home/xemul/src/seastar/build/debug/tests/unit/directory_test+0x41f634) (BuildId: 4fbbf34b0f6f681d9653e942595ccd71097e0cb6)

0x61300001be90 is located 16 bytes inside of 336-byte region [0x61300001be80,0x61300001bfd0)
freed by thread T0 here:
    #0 0x7f1c9b4da868 in operator delete(void*) (/lib64/libasan.so.8+0xda868) (BuildId: 542ad02088f38edfdba9d4bfa465b2299f512d3e)
    #1 0x7f1c91e5adbe in make_list_directory_generator /home/xemul/src/seastar/src/core/file.cc:422
    #2 0x7f1c9201d25b in std::__n4861::coroutine_handle<seastar::coroutine::experimental::internal::generator_buffered_promise<seastar::directory_entry, seastar::dir_entry_buffer> >::resume() const /usr/include/c++/13/coroutine:240
    #3 0x7f1c9201a598 in seastar::coroutine::experimental::internal::generator_buffered_promise<seastar::directory_entry, seastar::dir_entry_buffer>::run_and_dispose() (/home/xemul/src/seastar/build/debug/libseastar.so+0x761a598) (BuildId: 01c7d2a5433d6b71f9f622f998057db9334bb48e)
    #4 0x7f1c9257afa8 in seastar::reactor::run_tasks(seastar::reactor::task_queue&) /home/xemul/src/seastar/src/core/reactor.cc:2661
    #5 0x7f1c92582445 in seastar::reactor::run_some_tasks() /home/xemul/src/seastar/src/core/reactor.cc:3124
    #6 0x7f1c92588269 in seastar::reactor::do_run() /home/xemul/src/seastar/src/core/reactor.cc:3293
    #7 0x7f1c92584044 in seastar::reactor::run() /home/xemul/src/seastar/src/core/reactor.cc:3176
    #8 0x7f1c920f6fbd in seastar::app_template::run_deprecated(int, char**, std::function<void ()>&&) /home/xemul/src/seastar/src/core/app-template.cc:276
    #9 0x7f1c920f4772 in seastar::app_template::run(int, char**, std::function<seastar::future<int> ()>&&) /home/xemul/src/seastar/src/core/app-template.cc:167
    #10 0x7f1c920f4c6a in seastar::app_template::run(int, char**, std::function<seastar::future<void> ()>&&) /home/xemul/src/seastar/src/core/app-template.cc:182
    #11 0x424e21 in main /home/xemul/src/seastar/tests/unit/directory_test.cc:151
    #12 0x7f1c89e49b89 in __libc_start_call_main (/lib64/libc.so.6+0x27b89) (BuildId: 7026fe8c129a523e07856d7c96306663ceab6e24)
    #13 0x7f1c89e49c4a in __libc_start_main_alias_2 (/lib64/libc.so.6+0x27c4a) (BuildId: 7026fe8c129a523e07856d7c96306663ceab6e24)
    #14 0x41f634 in _start (/home/xemul/src/seastar/build/debug/tests/unit/directory_test+0x41f634) (BuildId: 4fbbf34b0f6f681d9653e942595ccd71097e0cb6)

previously allocated by thread T0 here:
    #0 0x7f1c9b4d9e28 in operator new(unsigned long) (/lib64/libasan.so.8+0xd9e28) (BuildId: 542ad02088f38edfdba9d4bfa465b2299f512d3e)
    #1 0x7f1c91e580d3 in make_list_directory_generator /home/xemul/src/seastar/src/core/file.cc:422
    #2 0x7f1c91e5b463 in seastar::posix_file_impl::experimental_list_directory() /home/xemul/src/seastar/src/core/file.cc:427
    #3 0x7f1c91e73ab9 in seastar::file::experimental_list_directory() /home/xemul/src/seastar/src/core/file.cc:1166
    #4 0x42034a in lister_generator_test /home/xemul/src/seastar/tests/unit/directory_test.cc:94
    #5 0x41fb68 in lister_generator_test(seastar::file) /home/xemul/src/seastar/tests/unit/directory_test.cc:93
    #6 0x4238ef in lister_generator_test /home/xemul/src/seastar/tests/unit/directory_test.cc:142
    #7 0x435f2d in std::__n4861::coroutine_handle<seastar::internal::coroutine_traits_base<void>::promise_type>::resume() const /usr/include/c++/13/coroutine:240
    #8 0x42a3c5 in seastar::internal::coroutine_traits_base<void>::promise_type::run_and_dispose() /home/xemul/src/seastar/include/seastar/core/coroutine.hh:125
    #9 0x7f1c9257afa8 in seastar::reactor::run_tasks(seastar::reactor::task_queue&) /home/xemul/src/seastar/src/core/reactor.cc:2661
    #10 0x7f1c92582445 in seastar::reactor::run_some_tasks() /home/xemul/src/seastar/src/core/reactor.cc:3124
    #11 0x7f1c92588269 in seastar::reactor::do_run() /home/xemul/src/seastar/src/core/reactor.cc:3293
    #12 0x7f1c92584044 in seastar::reactor::run() /home/xemul/src/seastar/src/core/reactor.cc:3176
    #13 0x7f1c920f6fbd in seastar::app_template::run_deprecated(int, char**, std::function<void ()>&&) /home/xemul/src/seastar/src/core/app-template.cc:276
    #14 0x7f1c920f4772 in seastar::app_template::run(int, char**, std::function<seastar::future<int> ()>&&) /home/xemul/src/seastar/src/core/app-template.cc:167
    #15 0x7f1c920f4c6a in seastar::app_template::run(int, char**, std::function<seastar::future<void> ()>&&) /home/xemul/src/seastar/src/core/app-template.cc:182
    #16 0x424e21 in main /home/xemul/src/seastar/tests/unit/directory_test.cc:151
    #17 0x7f1c89e49b89 in __libc_start_call_main (/lib64/libc.so.6+0x27b89) (BuildId: 7026fe8c129a523e07856d7c96306663ceab6e24)
    #18 0x7f1c89e49c4a in __libc_start_main_alias_2 (/lib64/libc.so.6+0x27c4a) (BuildId: 7026fe8c129a523e07856d7c96306663ceab6e24)
    #19 0x41f634 in _start (/home/xemul/src/seastar/build/debug/tests/unit/directory_test+0x41f634) (BuildId: 4fbbf34b0f6f681d9653e942595ccd71097e0cb6)

SUMMARY: AddressSanitizer: heap-use-after-free /home/xemul/src/seastar/src/core/reactor.cc:2661 in seastar::reactor::run_tasks(seastar::reactor::task_queue&)
Shadow bytes around the buggy address:
  0x61300001bc00: fd fd fd fd fd fd fd fd fd fd fa fa fa fa fa fa
  0x61300001bc80: fa fa fa fa fa fa fa fa fd fd fd fd fd fd fd fd
  0x61300001bd00: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x61300001bd80: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x61300001be00: fd fd fa fa fa fa fa fa fa fa fa fa fa fa fa fa
=>0x61300001be80: fd fd[fd]fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x61300001bf00: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x61300001bf80: fd fd fd fd fd fd fd fd fd fd fa fa fa fa fa fa
  0x61300001c000: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x61300001c080: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x61300001c100: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07 
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
==21738==ABORTING
Aborting on shard 0.
Backtrace:
  /lib64/libasan.so.8+0x69bb0
  /home/xemul/src/seastar/build/debug/libseastar.so+0x7cfb082
  /home/xemul/src/seastar/build/debug/libseastar.so+0x7cc9ae9
  /home/xemul/src/seastar/build/debug/libseastar.so+0x7b1810b
  /home/xemul/src/seastar/build/debug/libseastar.so+0x7b18338
  /home/xemul/src/seastar/build/debug/libseastar.so+0x7ba4de5
  /home/xemul/src/seastar/build/debug/libseastar.so+0x7bf549d
  /home/xemul/src/seastar/build/debug/libseastar.so+0x7bf5525
  /lib64/libc.so.6+0x3dbaf
  /lib64/libc.so.6+0x8e883
  /lib64/libc.so.6+0x3dafd
  /lib64/libc.so.6+0x2687e
  /lib64/libasan.so.8+0xf7efe
  /lib64/libasan.so.8+0x1073f0
  /lib64/libasan.so.8+0xe2f70
  /lib64/libasan.so.8+0xe2562
  /lib64/libasan.so.8+0xe36eb
  /home/xemul/src/seastar/build/debug/libseastar.so+0x7b7af29
  /home/xemul/src/seastar/build/debug/libseastar.so+0x7b82445
  /home/xemul/src/seastar/build/debug/libseastar.so+0x7b88269
  /home/xemul/src/seastar/build/debug/libseastar.so+0x7b84044
  /home/xemul/src/seastar/build/debug/libseastar.so+0x76f6fbd
  /home/xemul/src/seastar/build/debug/libseastar.so+0x76f4772
  /home/xemul/src/seastar/build/debug/libseastar.so+0x76f4c6a
  0x424e21
  /lib64/libc.so.6+0x27b89
  /lib64/libc.so.6+0x27c4a
  0x41f634
Aborted (core dumped)

(this patch applied)

@@ -124,15 +126,19 @@ class test_file_impl : public file_impl {
 };

 future<> lister_generator_test() {
-    fmt::print("--- Generator lister test ---\n");
-    auto f = co_await engine().open_directory(".");
-    co_await lister_generator_test(std::move(f));
-
-    fmt::print("--- Generator fallback test ---\n");
-    auto lf = co_await engine().open_directory(".");
-    auto tf = ::seastar::make_shared<test_file_impl>(std::move(lf));
-    auto f2 = file(std::move(tf));
-    co_await lister_generator_test(std::move(f2));
+    for (int i = 0; i < 1024; i++) {
+#if 1
+        fmt::print("--- Generator lister test {} ---\n", i);
+        auto f = co_await engine().open_directory(".");
+        co_await lister_generator_test(std::move(f));
+#else
+        fmt::print("--- Generator fallback test {} ---\n", i);
+        auto lf = co_await engine().open_directory(".");
+        auto tf = ::seastar::make_shared<test_file_impl>(std::move(lf));
+        auto f2 = file(std::move(tf));
+        co_await lister_generator_test(std::move(f2));
+#endif
+    }
 }
 #else
 future<> lister_generator_test() {
tchaikov commented 11 months ago

hi Pavel, thank you for sharing. i think the 2nd one -- the "use-after-free" one is the first issue described in this issue's description.

xemul commented 11 months ago

@tchaikov , no these use-after-free-s are a bit different -- the former one is from fallback generator code, the latter from native generator

tchaikov commented 11 months ago

they are basically the same. the outer coroutine / task is implemented as a promise, it is destroyed right after return_void(), but it is scheduled by its sub coroutine (the one which co_yield) before it is destroyed somehow.