microsoft / msquic

Cross-platform, C implementation of the IETF QUIC protocol, exposed to C, C++, C# and Rust.
MIT License
4.05k stars 530 forks source link

Race in QuicTestClientDisconnect on cleanup #1100

Closed ThadHouse closed 3 years ago

ThadHouse commented 3 years ago

In QuicTestClientDisconnect, at the end of the test we call WaitForShutdownComplete() on the client

https://github.com/microsoft/msquic/blob/main/src/test/lib/DataTest.cpp#L694

In the connection itself, AutoDelete is set to true, which causes the shutdown callback to delete the connection.

https://github.com/microsoft/msquic/blob/main/src/test/lib/TestConnection.cpp#L772

This happens just a few lines after QuicEventSet(EventShutdownComplete) is called. The deletion deletes the event, while its still possible the other thread has not started back up, and is still being waited on. If this happens, the event deletion fails, which triggers an assertion that the event did not uninitialize properly.

This seems to only be an issue on Linux, as Windows does seem to behave if an event is deleted while being waited on, as likely this is a kernel counted object.

https://dev.azure.com/ms/msquic/_build/results?buildId=127592&view=results

mxms0 commented 3 years ago

I was also seeing this issue on macOS :)

ThadHouse commented 3 years ago

I suspected, since they use the same platform API and assertions to handle events. Its rare for now, but we'll have to look into how to fix it properly.

nibanks commented 3 years ago

I suspected we can simply remove the call to WaitForShutdownComplete here. While it is theoretically making sure we complete the shutdown in a timely manner, it's never found a "shutdown took too long bug". It either works or it deadlocked. Either way, when we close the registration it will internally (in MsQuic) do the same wait anyways, but forever.