Open yfinkelstein opened 5 years ago
in lldb:
Process 98425 stopped
* thread #2, stop reason = EXC_BAD_ACCESS (code=2, address=0x700001fa6fd0)
frame #0: 0x000000010028c708 cppcoro_tests`(anonymous namespace)::local::get_resumable_waiter_count(state=<unavailable>) at async_auto_reset_event.cpp:33
30 }
31
32 constexpr std::uint32_t get_resumable_waiter_count(std::uint64_t state)
-> 33 {
34 return std::min(get_set_count(state), get_waiter_count(state));
35 }
36 }
Target 0: (cppcoro_tests) stopped.
given the recursion depth it's probably not important where exactly it crashes though.
The crash happens both in debug and release builds
if thread pool size is changed from 3 to 1
cppcoro::static_thread_pool tp{ 1 };
at the top of the test, the crash does not happen any more
Yes, looking at the test it is a potential recursion problem. The call to event.set()
resumes the awaiting coroutine inline inside the call to set()
.
This is both a flaw in the test and a flaw in the design of the async_manual_reset_event
class.
One solution is to require the waiting coroutine to provide a scheduler to use to reschedule the resumption if the event was not signalled when it was awaited. Eg.
co_await event.wait(someScheduler);
Another solution would be to return a value that indicates whether the co_await completed synchronously and let the awaiting coroutine decide what to do if it completes asynchronously.
provide a scheduler
I was long wondering how come this library can exist without a notion of a scheduler like the one necessary for stackfull coroutines such as https://www.boost.org/doc/libs/1_69_0/libs/fiber/doc/html/fiber/scheduling.html When suspending a coroutine after executing a non-blocking read io for instance, and then waking it up in io event polling loop there needs to be some sort of IO scheduler sitting in between that on one hand maps FD that received an IO to the coroutine handle but also schedules this coroutine to run in IO-scheduler sense of the word. I.e it may have some notion of coroutine priority , rate limit on FD, strand on FD ASIO-style, or whatever. It's clear that these concerns need to be factored out and coroutine scheduler is a good idea.
There are already a notion of a scheduler in this library (see the Scheduler
and TimedScheduler
concepts and the io_service
and static_thread_pool
implementations), I can see a need for something like a prioritised or deadline scheduler, however.
By default a coroutine is resumed inline on the thread that dequeues the I/O completion event. If you want to avoid performing processing inline on the I/O thread then you can reschedule onto another scheduler (possibly a prioritised scheduler) after the coroutine is resumed.
eg.
task<void> do_something(socket s, static_thread_pool& tp)
{
// Wait for data to be received.
char buffer[100];
co_await s.recv(buffer, sizeof(buffer));
// Resumed on I/O thread here
// Schedule processing of data onto thread pool.
co_await tp.schedule();
// Now on thread pool - process data
process(buffer);
}
Something similar could be done with the async_auto_reset_event
tests.
eg. by manually rescheduling onto the threadpool if the operation completes asynchronously (if it completes synchronously then the coroutine resumes inline and there is no recursion to worry about).
task<void> example(async_auto_reset_event& event, static_thread_pool& tp)
{
completion_type completion = co_await event;
if (completion == completion_type::asynchronous)
{
// We were resumed from some other coroutine calling event.set()
// Reschedule ourselves onto thread pool so we don't recursively call event.set()
co_await tp.schedule();
}
// do work
event.set();
}
This approach means that async_auto_reset_event
doesn't need to take a dependency on schedulers but it still gives the consumer enough information to be able to make the decision as to whether or not it needs to reschedule to avoid recursion.
However, this design also puts more of the complexity onto the consumer. Whereas making a scheduler a mandatory parameter of a .wait()
method leads the caller into better patterns but with the loss of some flexibility.
On Mac OSX, using clang 7
the test in headline crashes in an apparent infinite recursion: