Enforce correct destructor ordering

timmaxw commented 9 years ago

In our code, we have many classes which will automatically call a callback at any time until they are destroyed. Often these classes are used as members of other classes, and the callback accesses other members of the class. Sometimes the callback spawns coroutines which access other members of the class. The safe way to put this together is as follows:

class my_class_t {
public:
    my_class_t() : subs(std::bind(&my_class_t::callback1, this)) { }
    void callback1() {
        ASSERT_FINITE_CORO_WAITING;
        foo.nonblocking_foo();
        coro_t::spawn_sometime(std::bind(&my_class_t::callback2, this, drainer.lock()));
    }
    void callback2(auto_drainer_t::lock_t keepalive) {
        try {
            foo.nonblocking_foo();
            bar.blocking_bar(keepalive.get_drain_signal());
        } catch (const interrupted_exc_t &) {
            /* ignore */
        }
    }
    foo_t foo;
    bar_t bar;
    auto_drainer_t drainer;
    subscription_t subs;
};

When the destructor is run, first subs is destroyed, which prevents calls to callback1(). Then drainer is destroyed, which blocks until all instances of callback2() have finished. This ensures that callback1() and callback2() cannot access foo and bar when they are being destroyed.

The problem is that people often put this class together in the wrong order. Sometimes they forget the auto_drainer_t (see #3552). Other times they put drainer after subs, which will lead to crashes if callback1() runs during or after the drainer destructor. Or they may put foo and bar after drainer or after subs, which means that their destructors may be called while callback1() and callback2() can still access them. This is a source of subtle bugs.

The following solutions have been proposed:

Improve the documentation. We could put a big loud warning comment at the top of auto_drainer.hpp, and be sure to educate new engineers about these issues. But documentation is fragile and nobody reads it.
We could make auto_drainer_t have a non-trivial constructor, to encourage people to think about constructor order. One option is that auto_drainer_t should take an initializer-list of void*, so it would be initialized as drainer(&foo, &bar); this would symbolically indicate that foo and bar must be constructed before drainer. Another option is to make it take a dummy type called something like remember_to_consider_destructor_order_t. But these only encourage correct behavior rather than enforcing it.
We could use static analysis to find violations. We would write a list of "subscription-like classes" (semilattice and directory subscriptions, repeating_timer_t, mailbox_t, etc.) and write a static analyzer script to enforce that regular members come first, then the auto_drainer_t if any, then subscription-like classes. We could use Clang's Python bindings, or gcc-xml.
We could force auto_drainer_t and subscription-like classes to be explicitly destroyed. Basically, auto_drainer_t and subscription_t would have stop() methods, which would be equivalent to their current destructors; if the destructor was called and stop() had not been called, then they would crash in debug mode. This would force the coder to give my_class_t an explicit destructor that calls subs.stop(); drainer.stop(). Because these would necessarily run before the foo and bar destructors, this solution automatically enforces that foo and bar can never be accessed during destruction. It's still possible to call drainer.stop() before subs.stop(), but it encourages people to think about it.

danielmewes commented 9 years ago

I like the "We could make auto_drainer_t have a non-trivial constructor" option because it's not much work (especially the remember_to_consider_destructor_order_t variant) and would likely be a noticeable improvement already.

The explicit stop() seems slightly more effective. I'm a little bit concerned that it might become annoying to use though.

Static analysis seems to give the strongest guarantees. We could even consider enforcing that all class members come before an auto drainer, unless they're somehow added to a white list. This solution seems to be very work-intensive though, and I'm not convinced that this would be work well spent.

mlucy commented 9 years ago

I don't think the remember_to_consider_destructor_order_t buys us much; auto_drainer_t already tells me that (although we could make the name scarier I suppose).

I asked this before, I think, but I forgot the answer: why can't we just make the constructors for the subscription-like classes take a reference to an auto drainer (which they promptly ignore)? That way you'd know that an auto drainer had already been constructed when you construct them, which is reasonably strong evidence that the auto drainer will be destroyed second.

timmaxw commented 9 years ago

The advantage of remember_to_consider_destructor_order_t is that it gives the auto_drainer_t a non-trivial constructor, so it must appear in the class's initializer list. The idea is that this would make people think a bit harder. Also, this proposal is mostly for the benefit of new engineers and outside contributors who might not automatically think about destructor order when they see auto_drainer_t.

The problem with making subscription-like classes take a reference to an auto_drainer_t is that it often makes sense to use a subscription-like class without an auto drainer. You only need an auto_drainer_t if you're planning to spawn coroutines from your subscription callback.

mlucy commented 9 years ago

What if we made the subscription-like class take a pointer to an auto-drainer, or NULL if it doesn't need an auto drainer, and if the subscription-like class receives NULL then it does ASSERT_NO_CORO_WAITING before calling its callback to ensure that it actually didn't need the auto drainer? (Or a boost optional if you don't want people passing NULL pointers out like candy.)

timmaxw commented 9 years ago

Most subscription-like classes' callbacks already do ASSERT_NO_CORO_WAITING. The problem isn't that the callback will block; it's that the callback will spawn a coroutine.

mlucy commented 9 years ago

Er, right. Could we add something similar to ASSERT_NO_CORO_WAITING which errors on spawn_sometime as well as spawn_now_dangerously?

timmaxw commented 9 years ago

That would produce a lot of false positives, because it's possible for a callback to unknowingly spawn a coroutine on the other side of an abstraction boundary. For example, it's reasonable for a callback to modify the value of a watchable_variable_t, since modifying the value of a watchable_variable_t is guaranteed not to block. But maybe a completely different object is watching that watchable_variable_t, and in response to the change, it spawns a new coroutine governed by its own auto_drainer_t.

rethinkdb / rethinkdb

Enforce correct destructor ordering #3554