Looking for stuck/deadlocked fibers could be easier

michoecho commented 1 month ago

In ScyllaDB, it happens from time to time that some startup/shutdown/maintenance fiber gets stuck. The first step to debugging a problem like this is finding out what fiber got stuck and where, and this usually involves repeatedly adding debug logs and hoping that the bug reproduces again. But that's time-consuming, and not guaranteed to succeed. It would be better to be able to find the stuck fiber in the coredump.

It's already possible to find chains of awaiting seastar::tasks in the coredump, but AFAIK it's not possible to guess what points in the source code they correspond to. For coroutines, we can find the relevant coroutine, but not point out the relevant co_await inside it. For then continuations, we can try to guess the relevant function based on the type of the continuation, but it's hard (if possible at all) to do that reliably.

So maybe we could, for example, capture the program counter (rip) at the call sites of then (and its variants) and co_await, and put it into the awaiting task (continuation/coroutine)? Combined with debug info, this should allow us to see the source locations of the awaiting continuation points.

And it should be cheap enough. (Probably.)

kbr-scylla commented 1 month ago

Sounds like a good idea. Would potentially save us a lot of time in multiple investigations.

cc @avikivity @denesb WDYT?

michoecho commented 1 month ago

Maybe we could even put some timestamps into the continuation, to weed out the relevant tasks from coredumps more easily. Copying lowres_clock::_now into the task during its creation should be pretty cheap too. (But of course it's yet another 8 bytes to pay.)

Both things could be hidden behind ifdefs so that users who don't need them don't pay for them.

raphaelsc commented 1 month ago

Maybe we could even put some timestamps into the continuation, to weed out the relevant tasks from coredumps more easily. Copying lowres_clock::_now into the task during its creation should be pretty cheap too. (But of course it's yet another 8 bytes to pay.)

Both things could be hidden behind ifdefs so that users who don't need them don't pay for them.

But such users think they don't need it, until the problem is hit :-) Like you said, the problem might be rare, and we'll wish we had this enabled every time the problem is hit.

michoecho commented 1 month ago

So maybe we could, for example, capture the program counter (rip) at the call sites of then

That could probably be implemented like so

diff --git a/include/seastar/core/future.hh b/include/seastar/core/future.hh
index 9e0b2061..c0fcf63c 100644
--- a/include/seastar/core/future.hh
+++ b/include/seastar/core/future.hh
@@ -43,6 +43,13 @@
 #include <seastar/util/std-compat.hh>
 #include <seastar/util/modules.hh>

+[[gnu::always_inline]]
+inline uintptr_t getrip() {
+    uintptr_t rip;
+    asm("leaq 0(%%rip), %0":"=r"(rip));
+    return rip;
+}
+
 namespace seastar {

 struct nested_exception : public std::exception {
@@ -1274,13 +1281,14 @@ class [[nodiscard]] future : private internal::future_base {
         future_base::schedule(tws, &tws->_state);
     }
     template <typename Pr, typename Func, typename Wrapper>
-    void schedule(Pr&& pr, Func&& func, Wrapper&& wrapper) noexcept {
+    void schedule(Pr&& pr, Func&& func, Wrapper&& wrapper, uintptr_t rip) noexcept {
         // If this new throws a std::bad_alloc there is nothing that
         // can be done about it. The corresponding future is not ready
         // and we cannot break the chain. Since this function is
         // noexcept, it will call std::terminate if new throws.
         memory::scoped_critical_alloc_section _;
         auto tws = new continuation<Pr, Func, Wrapper, T>(std::move(pr), std::move(func), std::move(wrapper));
+        tws->_static_id = rip;
         // In a debug build we schedule ready futures, but not in
         // other build modes.
 #ifdef SEASTAR_DEBUG
@@ -1422,9 +1430,9 @@ class [[nodiscard]] future : private internal::future_base {
     requires std::invocable<Func, T>
                  || (std::same_as<void, T> && std::invocable<Func>)
     Result
-    then(Func&& func) noexcept {
+    then(Func&& func, uintptr_t rip = getrip()) noexcept {
 #ifndef SEASTAR_TYPE_ERASE_MORE
-        return then_impl(std::move(func));
+        return then_impl(std::move(func), rip);
 #else
         using func_type = typename internal::future_result<Func, T>::func_type;
         noncopyable_function<func_type> ncf;
@@ -1434,7 +1442,7 @@ class [[nodiscard]] future : private internal::future_base {
                 return futurize_invoke(func, std::forward<decltype(args)>(args)...);
             });
         }
-        return then_impl(std::move(ncf));
+        return then_impl(std::move(ncf), rip);
 #endif
     }

@@ -1471,7 +1479,7 @@ class [[nodiscard]] future : private internal::future_base {

     // Keep this simple so that Named Return Value Optimization is used.
     template <typename Func, typename Result>
-    Result then_impl_nrvo(Func&& func) noexcept {
+    Result then_impl_nrvo(Func&& func, uintptr_t rip) noexcept {
         using futurator = futurize<internal::future_result_t<Func, T>>;
         typename futurator::type fut(future_for_get_promise_marker{});
         using pr_type = decltype(fut.get_promise());
@@ -1486,13 +1494,13 @@ class [[nodiscard]] future : private internal::future_base {
                     return internal::future_invoke(func, std::move(state).get_value());
                 });
             }
-        });
+        }, rip);
         return fut;
     }

     template <typename Func, typename Result = futurize_t<internal::future_result_t<Func, T>>>
     Result
-    then_impl(Func&& func) noexcept {
+    then_impl(Func&& func, uintptr_t rip) noexcept {
 #ifndef SEASTAR_DEBUG
         using futurator = futurize<internal::future_result_t<Func, T>>;
         if (failed()) {
@@ -1501,7 +1509,7 @@ class [[nodiscard]] future : private internal::future_base {
             return futurator::invoke(std::forward<Func>(func), get_available_state_ref().take_value());
         }
 #endif
-        return then_impl_nrvo<Func, Result>(std::forward<Func>(func));
+        return then_impl_nrvo<Func, Result>(std::forward<Func>(func), rip);
     }

 public:
diff --git a/include/seastar/core/task.hh b/include/seastar/core/task.hh
index 8dbfb789..eba69cb1 100644
--- a/include/seastar/core/task.hh
+++ b/include/seastar/core/task.hh
@@ -60,6 +60,7 @@ class task {
 #else
     void make_backtrace() noexcept {}
 #endif
+    uintptr_t _static_id = 0;
 };

 inline

michoecho commented 1 month ago

Having some code pointers in the task could also make printing tasktraces in runtime more practical. Today, they look like this:

2024-07-28T17:47:27.102+00:00 elasticity-test-ubuntu-db-node-02c49414-6     !INFO | scylla[6631]:  [shard 3:comp] compaction - [Compact keyspace1.standard1 0a781120-4d09-11ef-979a-076ed5b4cf3b] Compacting of 6 sstables interrupted due to: sstables::compaction_stopped_exception (Compaction for keyspace1/standard1 was
stopped due to: table removal), at 0x1de2dee 0x1dd5ad0 0x1820ef8 0x222ccbf 0x2239f3f 0x202f3cd
   --------
   seastar::continuation<seastar::internal::promise_base_with_type<sstables::compaction_result>, seastar::async<sstables::compaction::run(std::unique_ptr<sstables::compaction, std::default_delete<sstables::compaction> >)::$_0>(seastar::thread_attributes, sstables::compaction::run(std::unique_ptr<sstables::compaction, std::default_delete<sstables::compaction> >)::$_0&&)::{lambda()#2}, seastar::future<void>::then_impl_nrvo<seastar::async<sstables::compaction::run(std::unique_ptr<sstables::compaction, std::default_delete<sstables::compaction> >)::$_0>(seastar::thread_attributes, sstables::compaction::run(std::unique_ptr<sstables::compaction, std::default_delete<sstables::compaction> >)::$_0&&)::{lambda()#2}, seastar::future<sstables::compaction_result> >(sstables::compaction::run(std::unique_ptr<sstables::compaction, std::default_delete<sstables::compaction> >)::$_0&&)::{lambda(seastar::internal::promise_base_with_type<sstables::compaction_result>&&, seastar::async<sstables::compaction::run(std::unique_ptr<sstables::compaction, std::default_delete<sstables::compaction> >)::$_0>(seastar::thread_attributes, auto:1&&, (auto:2&&)...)::{lambda()#2}&, seastar::future_state<seastar::internal::monostate>&&)#1}, void>
   --------
   seastar::continuation<seastar::internal::promise_base_with_type<sstables::compaction_result>, seastar::future<sstables::compaction_result>::finally_body<seastar::async<sstables::compaction::run(std::unique_ptr<sstables::compaction, std::default_delete<sstables::compaction> >)::$_0>(seastar::thread_attributes, sstables::compaction::run(std::unique_ptr<sstables::compaction, std::default_delete<sstables::compaction> >)::$_0&&)::{lambda()#3}, false>, seastar::future<sstables::compaction_result>::then_wrapped_nrvo<seastar::future<sstables::compaction_result>, seastar::future<sstables::compaction_result>::finally_body<seastar::async<sstables::compaction::run(std::unique_ptr<sstables::compaction, std::default_delete<sstables::compaction> >)::$_0>(seastar::thread_attributes, sstables::compaction::run(std::unique_ptr<sstables::compaction, std::default_delete<sstables::compaction> >)::$_0&&)::{lambda()#3}, false> >(seastar::future<sstables::compaction_result>::finally_body<seastar::async<sstables::compaction::run(std::unique_ptr<sstables::compaction, std::default_delete<sstables::compaction> >)::$_0>(seastar::thread_attributes, sstables::compaction::run(std::unique_ptr<sstables::compaction, std::default_delete<sstables::compaction> >)::$_0&&)::{lambda()#3}, false>&&)::{lambda(seastar::internal::promise_base_with_type<sstables::compaction_result>&&, seastar::future<sstables::compaction_result>::finally_body<seastar::async<sstables::compaction::run(std::unique_ptr<sstables::compaction, std::default_delete<sstables::compaction> >)::$_0>(seastar::thread_attributes, auto:1&&, (auto:2&&)...)::{lambda()#3}, false>&, seastar::future_state<sstables::compaction_result>&&)#1}, sstables::compaction_result>
   --------
   seastar::internal::coroutine_traits_base<sstables::compaction_result>::promise_type
   --------
   seastar::internal::coroutine_traits_base<std::optional<sstables::compaction_stats> >::promise_type
   --------
   seastar::shared_future<std::optional<sstables::compaction_stats> >::shared_state

If you spend some time parsing those type names by hand, you can probably eventually figure out which lambdas in the source code they refer to. But it's not easy. I avoid doing that if I can. Perhaps there exists some tool that makes parsing those type names easier, but I don't know it. (And for coroutines we actually don't print any information aside from the return type, but that's something that can be fixed on its own).

denesb commented 1 month ago

No matter how much time you spend trying to decipher those lambda names, you won't be able to make sense of one which is big enough to fill your screen. And we have those.

I'm all for adding more metadata to the task objects. I'd love if we could clearly identify a task-chain (continuation-chain or green thread) by some id in memory, as well as walk through all the task objects that are part of it.

I think finding deadlocked tasks automatically is not achievable. But a command which would list all green threads on a shard, similar to how info threads works, would be immensely helpful. Currently, one has to resort to scylla task-histogram -a^1, locating individual task objects with scylla find^1 and finally unleashing scylla fiber^1 on some of the task objects. This is very clunky and takes a lot of time.

scylladb / seastar

Looking for stuck/deadlocked fibers could be easier #2381