Cancel propagation broken for nested IAsyncAction after #1228

alvinhochun commented 1 year ago

Continuing the discussion in #1242

@sylveon It seems your change in #1228 did break cancel propagation, which shows up in the async_propagate_cancel test inside Check([] { return ActionAction(10); }); sporadically, because it only fails if async.Cancel() is called after the first call to co_await ActionAction(wait, depth - 1); has already gone through await_transform (base_coroutine_foundation.h):

https://github.com/microsoft/cppwinrt/blob/4a5acf64457c82dbf1ce6f2e81bc0bdc1adbe51e/strings/base_coroutine_foundation.h#L603-L621

Line 611-618 is responsible for propagating the cancellation to the child coroutine. The condition and cast here is:

if constexpr (std::is_convertible_v<std::remove_reference_t<decltype(expression)>&, enable_await_cancellation&>) {
    //[...]
    static_cast<enable_await_cancellation&>(expression).set_cancellable_promise(&m_cancellable);

But before #1228, the check inside notify_awaiter was:

if constexpr (std::is_convertible_v<std::remove_reference_t<decltype(awaitable)>&, enable_await_cancellation&>) {
    // [...]
    static_cast<enable_await_cancellation&>(awaitable).set_cancellable_promise(promise /* same as &m_cancellable */);

awaitable was declared as:

decltype(get_awaiter(std::declval<T&&>())) awaitable
    /* effectively: */ = get_awaiter(static_cast<T&&>(expression));

get_awaiter makes use of operator co_await to convert the awaitables. The following applies:

https://github.com/microsoft/cppwinrt/blob/4a5acf64457c82dbf1ce6f2e81bc0bdc1adbe51e/strings/base_coroutine_foundation.h#L251-L272

Because this conversion does not happen anymore after #1228, the cancellation is not propagated properly, therefore the child coroutines don't get cancelled, and depending on how late async.Canecl() is called, the completion handler may not be invoked. My change in https://github.com/alvinhochun/cppwinrt/commit/c34c31ab9eac6c371aa0c90ffbbcf03913148069 triggers this reliably.

I can confirm that the failure is gone after reverting #1228 and these two changes.

CC @kennykerr @oldnewthing

sylveon commented 1 year ago

Thanks for the investigation. It looks like the root cause is that await_transform is called before operator co_await is in the compiler-generated code, while previously the constructor of notify_awaiter manually called operator co_await (through get_awaiter) before checking for cancellation support. See [expr.await]/3. get_awaiter was probably introduced to workaround this limitation for suspension hooks, and then cancellation propagation built upon it.

One fix would be to "inline" await_adapter directly in consume_Windows_Foundation_IAsyncInfo<D>, so that way the operator co_await function is no longer needed. But this wouldn't fix it for third party types & adapters, for example awaitable_event will probably be broken. (Since await_transform would get the raw awaitable_event, then notice that it doesn't derive from enable_await_cancellation, and wouldn't enable propagation).

One could also theorically move the responsability of enabling cancellation propagation to the await_suspend function of the awaiter. They can take an std::coroutine_handle<T> (instead of the type erased version) and call set_cancellable_promise(handle->promise()) on themselves (they would need to make sure T is a cppwinrt promise type, or we could update set_cancellable_promise to do that). We would need to make this is source breaking (for example, by renaming enable_await_cancellation) to prevent silently making cancellable third-party code stop working. I can have a go at this, if this solution is picked.

Otherwise, we would probably need to introduce the hacky get_awaiter machinery back (which was the main goal for getting rid of the notification hooks) :(

alvinhochun commented 1 year ago

Okay. I am not familiar with this enough so I think I'll leave this to you or someone else to try to fix.

kennykerr commented 1 year ago

Ideally, we can keep things relatively simple and not reintroduce the awaiter/finder nastiness. 😬

sylveon commented 1 year ago

Unless there is a potential solution I'm not aware of, we'd have to do breaking changes to fix this without reintroducing get_awaiter: by renaming enable_await_cancellation (we could do these changes without renaming, but it would lead to code silently not being registered for cancellation, and source breaking > runtime breaking) and requiring the awaiter to register for cancellation in await_suspend, instead of having the promise do it in await_tranform.

Since await_suspend can get access to the promise type, and is called by the compiler after resolving the awaiter, it will work without get_awaiter.

From a cursory search of all code on GitHub, I only found one use of enable_await_cancellation, and the change would only happen once they upgrade the cppwinrt nuget anyways (or to the eventual windows SDK release with this new cppwinrt version).

kennykerr commented 1 year ago

Probably want to wait for @oldnewthing to weigh in as this is his feature. I don't know how important this is. I can imagine nobody noticing if this stopped working. 😜

oldnewthing commented 1 year ago

I searched in all the Microsoft repos I could find, and couldn't find anyone who uses enable_await_cancellation. So I'm okay with introducing a breaking change there. (We do use enable_propagation so we should try not to break that.)

sylveon commented 1 year ago

Didn't plan on breaking enable_propagation either :)

Will take a look this weekend.

sylveon commented 1 year ago

Here's a minimum repro of the issue, which I will use as a basis for a unit test:

#include <Windows.h>
#include <winrt/Windows.Foundation.h>
#include <chrono>
#include <thread>

struct awaitable_event
{
    void set() const noexcept
    {
        SetEvent(os_handle());
    }

    auto operator co_await() const noexcept
    {
        return winrt::resume_on_signal(os_handle());
    }

    HANDLE os_handle() const noexcept
    {
        return handle.get();
    }

private:
    winrt::handle handle{
      winrt::check_pointer(CreateEvent(nullptr,
      /* manual reset */ true, /* initial state */ false,
      nullptr)) };
} never_signaled, cancelled;

winrt::Windows::Foundation::IAsyncAction func()
{
    co_await winrt::resume_background();

    auto cancel = co_await winrt::get_cancellation_token();
    cancel.enable_propagation();

    // need to use awaitable_event otherwise await_transform will
    // pick up resume_on_signal's awaiter, making it work as expected
    try
    {
        co_await never_signaled;
    }
    catch (winrt::hresult_canceled const&)
    {
        cancelled.set();
    }
}

int main()
{
    auto test = func();
    std::this_thread::sleep_for(std::chrono::seconds(1));
    test.Cancel();
    WaitForSingleObject(cancelled.os_handle(), INFINITE); // deadlock since #1228
}

microsoft / cppwinrt

Cancel propagation broken for nested IAsyncAction after #1228 #1243