Cannot do checks inside a thread

Initially reported here.

We currently rely on thread-local state to link the checks to the currently-running test. This state is set by the registry when starting the test, and users cannot set it. Therefore, if a test is launching a thread and trying to run checks inside the thread, the test state is not set and check functions terminate.

TEST_CASE("foo") {
    std::size_t i = 0;
    std::jthread t([&] {
        CHECK(i == 0); // std::terminate called
    });
}

If we want to allow this, we need to somehow fill in the test state in the child thread. Ideally this would be automatic, for example: if the test state is empty, get the state of the parent thread, repeat until a state is found (and terminate if no parent thread). But I don't think this is possible, because standard threads have no concept of a "parent" thread.

Also, consider this example of a basic thread pool, in which the worker threads are lazily created when the first task is pushed to the pool:

class ThreadPool {
    std::vector<std::jthread> threads;
    std::vector<std::function<void()>> tasks;

public:
    template<typename F>
    void run(F&& func) {
        if (threads.empty()) {
            for (std::size_t i = 0; i < 8; ++i) {
                threads.emplace_back([&]() { /* process tasks */ });
            }
        }

        tasks.push_back(std::forward<F>(func));
    }
} pool;

TEST_CASE("test 1") {
    pool.run([]() {
        CHECK(1 == 1);
    });
}

TEST_CASE("test 2") {
    pool.run([]() {
        CHECK(1 == 1);
    });
}

The worker threads are created in "test 1", and thus could inherit the test state from "test 1". So far so good. But then consider the following problems:

The thread pool had a long queue to process, so it only gets around to running the task after "test 1" was finished; there's no running test at that point, so we should be careful not to use dangling state from "test 1" (correct answer here would be to terminate, as is done today).
"test 1" finishes and "test 2" starts; the worker threads created by "test 1" are still alive, but the correct test state to use is now that of "test 2". How can we notify the threads of that?
Consider further that "test 1" and "test 2" may be run in parallel in the future, so there's no way to predict which of the two tests is going to initialise the worker threads, and if each test is pushing a number of tasks on the pool, the correct test state to use could have to go back and forth between "test 1" and "test 2" for the same worker thread.

I need to think about this some more, but currently I don't think there's a way to get this to work automatically (and do the correct, unsurprising thing). My thinking at the moment is that this would require an explicit forwarding of the test state, something like:

TEST_CASE("test 1") {
    auto& test_state = snitch::impl::get_current_test();
    pool.run([&]() {
        snitch::impl::set_current_test(&test_state);
        CHECK(1 == 1);
    });
}

This code compiles today (though it uses the internal snitch API; so don't rely on it). But that doesn't solve problem 1 above (dangling test state pointer).

Maybe I should first read how other test frameworks approach this.

From what I can piece together:

In Catch2 the equivalent to our get_current_test() is called getResultCapture(), which in turn calls getCurrentContext().getResultCapture(). This current context is a static class member (lazily initialised), hence it is shared by all threads. This assumes there's only one test running at any given time, which is a design decision of Catch2. So, in Catch2, parallel test execution is simply not supported.

Same in doctest; there is a global ContextState (g_cs); parallel test execution is also not supported.

Same for GoogleTest with UnitTest::GetInstance() which returns a function static variable; parallel test execution is also not supported.

In all these testing frameworks the recommended approach is to use sharding (i.e., run a different subset of the tests in separate processes). This solves the problem, though it is not without downsides (more overhead in process launch, inconvenient when running tests by hand on the command line -- probably requires a runner script to wrap the launches and collate the results).

boost-ut has a parallel test runner example. The test runner is configured as a global variable ut::cfg, which is accessed to process all events. Issue is, I don't see how the assertions are linked to the currently running test; it seems they are not. The default reporter is fully sequential, and expects events to be generated in the order test_begin, assertion_fail, test_end; hence it relies on event ordering to identify the test that triggered the assertion. In the parallel test runner examples, the events are processed by the runner and forwarded to the default reporter. This won't work for parallel reporting, unless I'm missing something :thinking: I don't think I am.

The solution adopted in Catch2/doctest/GoogleTest (global test instance) solves the problem of having to propagate the currently running test to child threads. However, it still has issues with asynchronous checks.

In the examples below*, all of these frameworks will report an assertion in the wrong test (the check inside the thread is reported in the second test case instead of the first), and will segfault if an assertion is reported in a child thread when no test is running (e.g., if commenting out the second test case).

Catch / doctest:

#include "catch2/catch_all.hpp"
#include <thread>

TEST_CASE("testing 1") {
    new std::thread([] {
        std::this_thread::sleep_for(std::chrono::seconds(1));
        CHECK(1 == 2);
    });
}

TEST_CASE("testing 2") {
    std::this_thread::sleep_for(std::chrono::seconds(2));
}

GoogleTest:

#include "gtest/gtest.h"
#include <thread>

TEST(TestSuite, Testing1) {
    new std::thread([] {
        std::this_thread::sleep_for(std::chrono::seconds(1));
        ASSERT_EQ(1, 2);
    });
}

TEST(TestSuite, Testing2) {
    std::this_thread::sleep_for(std::chrono::seconds(2));
}

*: please forgive the obvious memory leak, this is only to make the example concise.

In my opinion, both of these cases should be considered as errors. When no test is running, a segfault is unfortunate but at least signals that something isn't right; terminate with a message would be better. But incorrectly reporting the assertion in the wrong test is quite bad (the test that actually fails is marked as "passed").

The only way to properly handle this would be if the code running in the thread "knows" which test triggered it (so it can report the failure in the right test), and for the test to wait until the code is finished executing before reporting the test end (with final pass/fail status). This means we'd need to do more than just pass a raw pointer/reference around; it needs to be a shared pointer:

TEST_CASE("test 1") {
    // Get the currently running test as a 'std::shared_ptr<...>', and
    // copy this shared pointer to prolong its lifetime while the job isn't complete
    pool.run([test_state = snitch::get_shared_current_test()] {
        // Set the current test in this thread, using a scope-guard object to automatically unset it.
        auto scoped_test_override = snitch::override_current_test(*test_state);
        CHECK(1 == 1);
        // 'scoped_test_override' goes out of scope; resets the current test to nullptr in this thread.
        // Lambda is destroyed, last shared pointer copy is destroyed: test is ended.
    });
}

Then in the snitch test runner, we'd need to wait until there's no remaining reference to the current test before considering it completed.

I worry a bit about the overhead this would introduce, but it may not be that bad: although all CHECK* and REQUIRE* function need to access the current test, they don't need to use a shared pointer; they can keep using the current interface (which uses raw pointers to the thread-local state, hence is fast). The only overheads would be:

The current test state is allocated as a shared pointer rather than on the stack. This defeats one of the design goals of snitch, which is to avoid heap allocations. It may be OK since the allocation happens before the test runs. But when tests run in parallel, that allocation could happen while another test is running, which would have observable effects on the tested code. We could probably solve that by using a custom allocator, unless there's a way to implement a std::shared_ptr-like interface (ref count and observable destruction) without the heap, which is doubtful. But maybe we don't need the full shared_ptr interface; a raw pointer to a stack object and a single atomic ref count would be sufficient (we don't need weak references, release(), make_shared(), etc.). edit: see below, this can be solved without a heap allocation indeed.
At the end of each test run, snitch would have to do a busy wait on the destruction of the test state. That's probably pretty cheap, but still an overhead on top of what is done today.

It's also worth noting that REQUIRE(...) checks, which are meant to immediately stop the test on failure, are inherently difficult to use within a separate thread. It would require a try/catch in the thread to catch and handle the abort exception, and it would then need to propagate the abort message to the parent thread. That's possible, but not easy. The simplest might be to stick to the default C++ behavior for unhandled exceptions in threads, which simply terminates. This isn't as nice as a REQUIRE(...) outside a thread, which only stops the current test. But it's the same behavior one would get with exceptions disabled (REQUIRE(...) terminates immediately).

Another problem: what about CAPTURE and SECTION? With the current architecture, these are stored alongside the test state, and would be shared by all threads pointing to that test. This is probably wrong, I think we'd want the captures and sections to remain thread-local all the time. Otherwise, a check executed in a child thread would be reported with whatever section and captures are currently active in the parent thread, which probably doesn't make sense.

This means that, following the "shared state" example above, we'd need to share part of test state and copy the rest. For example:

TEST_CASE("test") {
    SECTION("running in pool") {
        // Get a new test state, sharing the test ID of the current test, but with 
        // a copy of the current section/capture data, and store this alongside the job
        // to prolong the test lifetime while the job isn't complete.
        pool.run([test_state = snitch::get_shared_current_test()] {
            // Set the current test in this thread, using a scope-guard object to automatically unset it.
            auto scoped_test_override = snitch::override_current_test(*test_state);
            SECTION("some section") {
                CHECK(1 == 2);
            }
            // 'scoped_test_override' goes out of scope; resets the current test to nullptr in this thread.
            // Lambda is destroyed, last shared pointer copy is destroyed: test is ended.
        });
    }

    SECTION("running sequentially") {
        CHECK(1 == 2);
    }

I'd expect the CHECK(...) inside the thread to be reported as being in sections "running in pool" and "some section".

But maybe we don't need the full shared_ptr interface; a raw pointer to a stack object and a single atomic ref count would be sufficient (we don't need weak references, release(), make_shared(), etc.).

This could work:

#include <atomic>

namespace snitch {
using ref_count_t = std::uint_fast16_t; // we should make that configurable

template<typename T>
class shared_ref;

template<typename T>
class ref_counted {
public:
    T data;

    ref_counted() = default;

    template<typename... Args>
    explicit ref_counted(Args&&... args) : data(std::forward<Args>(args)...) {}

    ref_counted(const ref_counted&) = delete;
    ref_counted(ref_counted&&)      = delete;

    ref_counted& operator=(const ref_counted&) = delete;
    ref_counted& operator=(ref_counted&&)      = delete;

    ~ref_counted() {
        wait_until_exclusive();
    }

private:
    std::atomic<ref_count_t> refs = 1;

    void push_ref() {
        if (refs.load() == std::numeric_limits<ref_count_t>::max()) {
            assertion_failed("max reference count exceeded");
        }

        refs += 1;
    }

    void pop_ref() noexcept {
        if (refs.load() == 0) {
            terminate_with("negative reference count; bug");
        }

        refs -= 1;
        refs.notify_one(); // C++20; would have to check compiler support
    }

    template<typename U>
    friend class shared_ref;

public:
    shared_ref<T> get_ref();

    void wait_until_exclusive() const noexcept {
        while (true) {
            const auto current_ref = refs.load();
            if (current_ref <= 1) {
                break;
            }

            refs.wait(current_ref); // C++20; would have to check compiler support
        }
    }
};

template<typename T>
class shared_ref {
    ref_counted<T>* ptr = nullptr;

    template<typename U>
    friend class ref_counted;

    explicit shared_ref(ref_counted<T>* p) noexcept : ptr(p) {}

public:
    shared_ref() = default;

    shared_ref(const shared_ref& other) noexcept :
        ptr(other.ptr != nullptr ? other.ptr->get_ref().ptr : nullptr) {}

    shared_ref(shared_ref&& other) noexcept : ptr(other.ptr) {
        other.ptr = nullptr;
    }

    ~shared_ref() {
        reset();
    }

    shared_ref& operator=(const shared_ref& other) noexcept {
        reset();
        ptr = other.ptr != nullptr ? other.ptr->get_ref() : nullptr;
        return *this;
    }

    shared_ref& operator=(shared_ref&& other) noexcept {
        reset();
        std::swap(ptr, other.ptr);
        return *this;
    }

    void reset() noexcept {
        if (ptr != nullptr) {
            ptr->pop_ref();
            ptr = nullptr;
        }
    }

    T* operator->() const {
        if (ptr == nullptr) {
            assertion_failed("accessing null reference");
        }

        return &ptr->data;
    }

    T& operator*() const {
        if (ptr == nullptr) {
            assertion_failed("accessing null reference");
        }

        return ptr->data;
    }
};

template<typename T>
shared_ref<T> ref_counted<T>::get_ref() {
    push_ref();
    return shared_ref<T>(this);
}
} // namespace snitch

Usage:

ref_counted<int> parent(42); creates a new ref-counted object on the stack. No heap allocation.
parent.get_ref() creates a new reference, bumping the ref count by 1.
When that reference goes out of scope or is reset(), the ref count goes down by 1.
parent.wait_until_exclusive(); waits until it holds the only reference to the object. At that point the object is safe to destroy. This is automatically called in the destructor of parent, to ensure no other ref exists before the object is destroyed.

snitch-org / snitch

Cannot do checks inside a thread #188