Closed attila0x2A closed 6 years ago
The main mechanism that is used to ensure the OVERLAPPED structure does not get released is by storing the OVERLAPPED structure as a member of the file_read_operation
and file_write_operation
awaitable objects.
The I/O operation is not started until the operation object is awaited. It is the responsibility of the awaiting coroutine to ensure the operation object being awaited lives at least until the operation completes. Usually the operation object is stored as a local variable or temporary within the coroutine frame and so you get the correct lifetime by default.
If the operation does not complete synchronously then the awaiting coroutine is suspended until the operation completes. All local variables in the coroutine frame are kept alive until the coroutine is resumed. The awaiting coroutine is only resumed when the I/O completion event for that operation is posted to the I/O completion port and is subsequently processed by an I/O thread.
If an operation is executed with an associated cancellation_token
then the operation object will subscribe to cancellation notification by constructing a cancellation_registration
object as a member of the operation object. This cancellation callback is unsubscribed when the operation completes.
When a thread calls request_cancellation()
on a corresponding cancellation_source
then this thread will execute the cancellation callback for the operation which will then call CancelIoEx
. If the OS responds to this request for cancellation before the operation completes then an I/O completion event will be posted to the I/O completion port with an error code that indicates the operation was cancelled. Otherwise, if the operation completes before the cancellation request can be processed by the OS then the operation will complete normally.
There is a potential race here between the operation completing on one thread and cancellation being requested concurrently on another thread. We don't want the cancellation callback to pass a destructed OVERLAPPED structure in a call to CancelIoEx()
. To guarantee this, we ensure that the cancellation callback is unsubscribed before returning control to the resumed coroutine which might subsequently destroy the operation object (and thus the OVERLAPPED structure). In the case that the cancellation callback is already executing on another thread, the unsubscribe process will block until after the callback has finished executing. This should ensure that the OVERLAPPED structure should remain valid until after the CancelIoEx()
call returns.
I hope that helps.
@Uran198 Is there any behaviour you are seeing that leads you to believe that there are some potential lifetime issues in the code?
I have seen some tests fail while performing file I/O, but only on MSVC x86 optimised builds - so I've been putting those failures down to bad optimised codegen by MSVC for now.
I have not seen any strange behavior, I was just looking through the code and this struck me as strange.
If the OS responds to this request for cancellation before the operation completes then an I/O completion event will be posted to the I/O completion port with an error code that indicates the operation was cancelled. Otherwise, if the operation completes before the cancellation request can be processed by the OS then the operation will complete normally.
If I understand correctly how asynchronous operations work, then after returning from CancelIoEx
you do not have a guarantee that operations got canceled and overlapped structure is no longer in use. So you can get into a scenario where you call CancelIoEx
, then you destruct file_read_operation
object, and only now OS completes the cancellation of the I/O marking it as canceled. The issue here is that OS has a reference to the overlapped, so it will try writing to the memory which has already been released.
Based on the article I sent earlier can't you run into the following situation?
file_read_operation op()
co_await op | -> | I/O begins
-- | -- | --
op.cancel() {
CancelIoEx | -> | I/O cancellation submitted to device driver
return
}
<op is destroyed>
| | Device driver was busy reading from the hard drive
Device driver receives the cancellation
Device driver abandons the rest of the read operation
Device driver reports that I/O has been canceled
I/O subsystem writes STATUS_CANCELED to OVERLAPPED structure
I/O subsystem queues the completion function (if applicable)
I/O subsystem signals the completion event (if applicable)
I/O operation is now complete
Also the documentation mentions that the overlapped structure is no longer in use only after the operation completes, not after the call to CancelIo
or CancelIoEx
When canceling asynchronous I/O, do not reuse overlapped structures with targeted cancellation. Once the I/O operation completes (either successfully or with a canceled status) then the overlapped structure is no longer in use by the system and can be reused.
The cancel()
method on the operation class is private and is only called by the internals of the win32_overlapped_operation_cancellable<OPERATION>
base-class in response to a request for cancellation via the cancellation_token
. API users cannot call .cancel()
on the operation object directly - this prevents .cancel()
from being called on a potentially destructed object.
The op
object in your example will not be destructed until the after the co_await op
statement completes. This will only happen once the I/O completion event for that operation is posted to the I/O completion port and has been processed by some thread. i.e. after the "I/O subsystem signals the completion event" line in your timeline.
The combination of the mechanics of win32_overlapped_operation_cancellable
ensuring that cancel()
is only called while the operation object is still alive and the co_await op
mechanics ensuring that the awaiting coroutine is not resumed until an I/O completion event is received and this in-turn ensuring that the operation object being awaited is not destroyed until after the statement containing the co_await op
expression completes should mean that the OVERLAPPED
structure lives until after the I/O operation completes.
@Uran198 Let me know if you still have concerns.
@lewissbaker Thank you for the explanation! I'm sorry I got confused and assumed that when you request cancellation with cancel_source
it cancels right away.
But now, I cannot figure out what is the correct way to cancel and release all the resources?
For example, if I create several tasks that do I/O, pass cancellation_token
to them and join them using when_all
with a task that calls io_service::process_events
. Then, from another thread, I want to stop all the processing and release all the resources taken by coroutines. To do this, I should call cancellation_source::request_cancellation
, wait while all of them are actually canceled and released by io_service
, and then stop io_service
, right? So is there a way that I can tell by using the library when all the cancel events were processed and it's safe to stop io_service
?
Another example is if I want to process a lot of I/O requests and then terminate. So again I create tasks and join then with when_all
with a task that calls io_service::process_events
. Is there a way I can tell that all the I/O operations are finished and I can destroy io_service
?
You can use a combination of sync_wait
, when_all
and on_scope_exit
to hook up running the event loop to run until a task completes.
For example, see https://github.com/lewissbaker/cppcoro/blob/862e99d8173ecb62a39abfd2b543de66bad98adc/test/io_service_tests.cpp#L129-L139
This top-level task can in turn run as many I/O operations as you like. The task
will complete only once all I/O operations have completed. As operations are not actually started until you await them, you can guarantee that a task will only complete once all operations it is composed of have completed.
See also #30 for wrapping this boiler-plate up in a helper function. (Edit: Fixed issue number)
Thanks a lot @lewissbaker for helping me better understand a library and how to work with it!
You probably wanted to mention this issue https://github.com/lewissbaker/cppcoro/issues/30.
While handling cancel for file_read_operation and file_write_operation we should make sure that
Overlapped
will not get released before the I/O operation completes. Here is a good explanation of why it is important.So in addition to calling
CancelIoEx
it is probably required to either callGetOverlappedResult
withbWait=TRUE
or add an event to theOverlapped
and wait on it. I'm not completely sure if it's best to useGetOverlappedResult
in this case because of the warning from the documentationSorry if I missed the mechanism of how you're making sure that
Overlapped
structure does not get released before I/O completes.