scylladb / seastar

High performance server-side application framework
http://seastar.io
Apache License 2.0
8.3k stars 1.54k forks source link

exceptions are not scalable #73

Open nyh opened 8 years ago

nyh commented 8 years ago

This is an issue to think about - it's not urgent, and I don't any idea how we can solve it.

Seastar goes out of its way not to use any locks or even atomic operations, because these are not scalable as the number of cores grow. In particularly, we have our own single-thread version of std::shared_ptr and std::string because the standard ones use atomic operations because they can be shared across threads.

One unscalable thing we're left with is exception handling: std::exception_ptr uses atomic operations (just like std::shared_ptr). But more worryingly, throwing an exception appears to be taking global locks while doing stack unwinding (see for example http://stackoverflow.com/questions/26257343/does-stack-unwinding-really-require-locks) which means one thread throwing an exception can block another thread which is also trying to throw an exception. And blocking is really bad on Seastar's single-thread-per-core design.

Obviously, the best solution is to use exceptions as little as possible. But when your sever is handling 1 million requests per second, you need to be really careful to avoid any possibility of exceptions in the course of request handling. Note that exceptions are known to be slow - that is fine. What is not fine is that an exception on one thread can block other threads on a machine with many cores.

I don't know if we can ever solve this issue without modifying/overriding libgcc, but the minimum we should do is to document this issue and warn against using exceptions too much in Seastar.

Another idea worth looking into is whether we can implement a future's exception state without actually throwing exceptions: In a lot of Seastar code, we do not throw an exception, but rather return a make_exception_future<...>(). Commit 44e35a4fb7e61e6ab628c236c030688b0a7f6038 prevent a bunch of wasteful rethrows of this store exception, but we still have two problems: 1) make_exception_future internally throws an exception to build a std::exception_ptr, and 2) code which uses then_wrapped() usually rethrows the exception when calling get(). Is there a way to support exceptional futures without the overheads of actual exception handling?

nyh commented 8 years ago

Looking at gcc's source code, libstdc++-v3/libsupc++/*, it seems not difficult in theory to take a abject, and instead of throwing it (which calls the __cxa_throw() code) and then catching it and calling std::current_exception() - is should be feasable to create an exception_ptr object directly. This might, however, need some modifications to libgcc because some things (like the std::exception_ptr::exception_ptr(void*) constructor) are private.

nyh commented 8 years ago

Some very partial progress in the situation:

  1. In commit 20bf03b930c3900bb897c24419451bdca476b3ce, Gleb found and fixed additional cases where we catch-and-rethrow an exception, where all we really needed to do was to pass along the existing exception_ptr.
  2. In https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68297 I started a discussion on making gcc's implementation of std::make_exception() (used by our make_exception_future()) work without throwing an exception at all. It appears possible to do, but will require some work inside libstdc++.
unknownzerx commented 8 years ago

In continuation-passing style, try catch can be manually implemented by maintaining two continuations -- one for normal flow and one for exceptional flow.

IMHO with promise-future, the continuation is still explicitly expressed (it might be called 'continuation-chaining style'?). We could avoid using C++ try catch at all.

nyh commented 8 years ago

I created https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71744 I'm not yet sure if the fix will involve only gcc, or also (or only) glibc (e.g., might require changes to dl_iterate_phdr?). So maybe I put this report in the wrong bug tracker.

nyh commented 7 years ago

Thanks to @gleb-cloudius, https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68297 was solved in gcc 7, and std::make_exception_ptr() will not involve throwing an actual exception.

Gleb, can you please summarize the state of https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71744, i.e., the attempt to allow concurrent exception throwing? I see there was a lot of activity on that issue, but don't understand what was the conclusion. Note that even though the above fix, and several others mentioned in comments above, reduced the amount of exception throwing - some still remains so it would be nice to make that scalable as well.

gleb-cloudius commented 7 years ago

There is no conclusion. I think there is an understanding of the problem and willingness to address it, but not at all cost. It involves coordination between gcc and glibc and hence needs much more time dedicated to it. I proposed a solution that requires adding a new ABI function that in gllibc that has to be used by newer gcc during unwind. There are comment on the function implementation itself (how it achieves parallelism) and on adding a new ABI function as opposite of doing something with symbol versioning (not sure how difference in locking behaviour can be addressed by symbol versioning, but then I did not looked enough into it). That's more or less the state.

-- Gleb.

nyh commented 6 years ago

In #399, @gleb-cloudius explains the current state of this issue.

"There are two locks that are taken on exception path. One was eliminated in gcc7 (by https://gcc.gnu.org/viewcvs/gcc?view=revision&revision=240193) for another we have a local workaround protected by NO_EXCEPTION_HACK. NO_EXCEPTION_HACK should not be set unless you try to compile libstdc++ or libgcc statically, so workaround should be enabled by default. 464f5e3 has much more detailed explanation of the whole ordeal. Note that the workaround assumes that no libraries are loaded dynamically after application starts (it is possible to support that too, just requires more coding)."

Basically Seastar does not have this bug any longer because one lock was eliminated by gcc 7 (so switch to gcc 7!) and a second lock we work around by reimplementing dl_iterate_phdr() ourselves in core/exception_hacks.cc.