Closed alvinhochun closed 1 year ago
Thanks for the investigation. It looks like the root cause is that await_transform
is called before operator co_await
is in the compiler-generated code, while previously the constructor of notify_awaiter
manually called operator co_await
(through get_awaiter
) before checking for cancellation support. See [expr.await]/3. get_awaiter
was probably introduced to workaround this limitation for suspension hooks, and then cancellation propagation built upon it.
One fix would be to "inline" await_adapter
directly in consume_Windows_Foundation_IAsyncInfo<D>
, so that way the operator co_await
function is no longer needed. But this wouldn't fix it for third party types & adapters, for example awaitable_event
will probably be broken. (Since await_transform
would get the raw awaitable_event
, then notice that it doesn't derive from enable_await_cancellation
, and wouldn't enable propagation).
One could also theorically move the responsability of enabling cancellation propagation to the await_suspend
function of the awaiter. They can take an std::coroutine_handle<T>
(instead of the type erased version) and call set_cancellable_promise(handle->promise())
on themselves (they would need to make sure T
is a cppwinrt promise type, or we could update set_cancellable_promise
to do that). We would need to make this is source breaking (for example, by renaming enable_await_cancellation
) to prevent silently making cancellable third-party code stop working. I can have a go at this, if this solution is picked.
Otherwise, we would probably need to introduce the hacky get_awaiter
machinery back (which was the main goal for getting rid of the notification hooks) :(
Okay. I am not familiar with this enough so I think I'll leave this to you or someone else to try to fix.
Ideally, we can keep things relatively simple and not reintroduce the awaiter/finder nastiness. 😬
Unless there is a potential solution I'm not aware of, we'd have to do breaking changes to fix this without reintroducing get_awaiter
: by renaming enable_await_cancellation
(we could do these changes without renaming, but it would lead to code silently not being registered for cancellation, and source breaking > runtime breaking) and requiring the awaiter to register for cancellation in await_suspend
, instead of having the promise do it in await_tranform
.
Since await_suspend
can get access to the promise type, and is called by the compiler after resolving the awaiter, it will work without get_awaiter
.
From a cursory search of all code on GitHub, I only found one use of enable_await_cancellation, and the change would only happen once they upgrade the cppwinrt nuget anyways (or to the eventual windows SDK release with this new cppwinrt version).
Probably want to wait for @oldnewthing to weigh in as this is his feature. I don't know how important this is. I can imagine nobody noticing if this stopped working. 😜
I searched in all the Microsoft repos I could find, and couldn't find anyone who uses enable_await_cancellation
. So I'm okay with introducing a breaking change there. (We do use enable_propagation
so we should try not to break that.)
Didn't plan on breaking enable_propagation either :)
Will take a look this weekend.
Here's a minimum repro of the issue, which I will use as a basis for a unit test:
#include <Windows.h>
#include <winrt/Windows.Foundation.h>
#include <chrono>
#include <thread>
struct awaitable_event
{
void set() const noexcept
{
SetEvent(os_handle());
}
auto operator co_await() const noexcept
{
return winrt::resume_on_signal(os_handle());
}
HANDLE os_handle() const noexcept
{
return handle.get();
}
private:
winrt::handle handle{
winrt::check_pointer(CreateEvent(nullptr,
/* manual reset */ true, /* initial state */ false,
nullptr)) };
} never_signaled, cancelled;
winrt::Windows::Foundation::IAsyncAction func()
{
co_await winrt::resume_background();
auto cancel = co_await winrt::get_cancellation_token();
cancel.enable_propagation();
// need to use awaitable_event otherwise await_transform will
// pick up resume_on_signal's awaiter, making it work as expected
try
{
co_await never_signaled;
}
catch (winrt::hresult_canceled const&)
{
cancelled.set();
}
}
int main()
{
auto test = func();
std::this_thread::sleep_for(std::chrono::seconds(1));
test.Cancel();
WaitForSingleObject(cancelled.os_handle(), INFINITE); // deadlock since #1228
}
Continuing the discussion in #1242
@sylveon It seems your change in #1228 did break cancel propagation, which shows up in the async_propagate_cancel test inside
Check([] { return ActionAction(10); });
sporadically, because it only fails ifasync.Cancel()
is called after the first call toco_await ActionAction(wait, depth - 1);
has already gone throughawait_transform
(base_coroutine_foundation.h
):https://github.com/microsoft/cppwinrt/blob/4a5acf64457c82dbf1ce6f2e81bc0bdc1adbe51e/strings/base_coroutine_foundation.h#L603-L621
Line 611-618 is responsible for propagating the cancellation to the child coroutine. The condition and cast here is:
But before #1228, the check inside
notify_awaiter
was:awaitable
was declared as:get_awaiter
makes use ofoperator co_await
to convert the awaitables. The following applies:https://github.com/microsoft/cppwinrt/blob/4a5acf64457c82dbf1ce6f2e81bc0bdc1adbe51e/strings/base_coroutine_foundation.h#L251-L272
Because this conversion does not happen anymore after #1228, the cancellation is not propagated properly, therefore the child coroutines don't get cancelled, and depending on how late
async.Canecl()
is called, the completion handler may not be invoked. My change in https://github.com/alvinhochun/cppwinrt/commit/c34c31ab9eac6c371aa0c90ffbbcf03913148069 triggers this reliably.I can confirm that the failure is gone after reverting #1228 and these two changes.
CC @kennykerr @oldnewthing