Open lewissbaker opened 6 years ago
I had also in x64 (debug and release) from time to time such an error. The expected thread was not the one on which the coroutine resumed. I could identify the source of the trouble to be this change: https://github.com/lewissbaker/cppcoro/commit/316ce6c7dd40c1e78f65f60d886db1eb8cc570dd#diff-54dfdc33313e0a90b0e5d173a2d22e46 The schedule_on coroutine is returning a task. The task::awaitable_base::await_suspend may return false if the target thread is fast enough and sets the promise continuation before it. Thus it is resuming directly on the calling thread, instead to be suspended and resumed on the target thread. To fix this, you could add a policy to the task or make schedule_on return a different type (ex task_resume_deferred) that implements the awaitable_base::await_suspend as:
void await_suspend(std::experimental::coroutine_handle<> awaiter) noexcept
{
// NOTE: This is the difference to regular task, it may not resume directly from here.
m_coroutine.promise().try_set_continuation(detail::continuation{ awaiter });
m_coroutine.resume();
}
I have a commit where I added a resume_policy to the task, what do you think? I have tested also with a separate task_resume_deferred type returned from the schedule_on, but it does not compose so well.
@tavi-cacina Thanks for the investigation and for the proposed patch!
The schedule_on
operator only specifies that the coroutine will start execution on the specified scheduler and not that it will complete and resume the awaiting on the specified scheduler.
When this test was first implemented the task<T>
implementation actually guaranteed to resume the awaiting coroutine on the same thread that the task's coroutine completed on. With the current implementation, as you correctly point out, it's possible for the task's coroutine to be scheduled onto another thread and run to completion before the caller is able to register the continuation and so the awaiting coroutine is resumed immediately on the thread executing the co_await
rather than on the thread the task completed on.
This test is currently relying on the old behaviour of task
to resume the awaiting coroutine on the same thread that it completed on.
For now we could just remove the second CHECK
in this test as it's not really testing the schedule_on
operator but rather that the task
resumes the awaiting coroutine on the thread that it completed on (which is currently not the case).
https://github.com/lewissbaker/cppcoro/blob/2492c0782911cdf7fe803642040efff3faa3c28a/test/scheduling_operator_tests.cpp#L33-L40
The correct solution here is to take advantage of the new variant of await_suspend()
that allows returning a coroutine_handle
to resume with a guaranteed tail-call (aka symmetric transfer) that can avoid the unbounded stack-growth for mutually recursive coroutine resumption.
There is an experimental implementation of this version of task
in the tailcall
branch. I did not merge this change into master
at the time as MSVC did not yet support the ofawait_suspend
overload. However I believe that MSVC now supports this feature so perhaps it is time to merge this change in?
You are right, I was confused because I played with a custom Scheduler
and expected to be after co_await context.schedule(someTask)
in the other context. The signature was task schedule(task)
and that caused this behavior. I saw then this issue here, thought that is the same. I am glad that there is no need to change the task
for this :-)
About the tailcall
I do not have the insight into it. Do everything you think is an improvement!
Update: MSVC still does not support symmetric transfer (at least for 2017 Update 9). I am still yet to test against 2019 builds but I have not heard that thes
Note that I have since merged some of the changes to task
from the tailcall
branch that allows it to use symmetric-transfer to avoid these problems under clang builds.
See PR #91.
There have been some random test-failures on the CI servers.
eg. This test run failed with:
There are a couple of working theories here:
schedule_on()
operator itself - the implementation of this is a fairly thin wrapper overco_await scheduler.schedule()