msys2 / msys2-runtime

Our friendly fork of Cygwin 💖 https://cygwin.org 💖 see the wiki for details
https://github.com/msys2/msys2-runtime/wiki
GNU General Public License v2.0
185 stars 39 forks source link

cygthread: suspend thread before terminating. #234

Closed jeremyd2019 closed 6 days ago

jeremyd2019 commented 6 days ago

This addresses an extremely difficult to debug deadlock when running under emulation on ARM64.

A relatively easy way to trigger this bug is to call fork(), then within the child process immediately call another fork() and then exit() the intermediate process.

It would seem that there is a "code emulation" lock on the wait thread at this stage, and if the thread is terminated too early, that lock still exists albeit without a thread, and nothing moves forward.

It seems that a SuspendThread() combined with a GetThreadContext() (to force the thread to actually be suspended, for more details see https://devblogs.microsoft.com/oldnewthing/20150205-00/?p=44743) makes sure the thread is "booted" from emulation before it is suspended.

Hopefully this means it won't be holding any locks or otherwise leave emulation in a bad state when the thread is terminated.

Also, attempt to use CancelSynchonousIo() (as seen in flock.cc) to avoid the need for TerminateThread() altogether. This doesn't always work, however, so was not a complete fix for the deadlock issue.

Addresses: https://cygwin.com/pipermail/cygwin-developers/2024-May/012694.html

Fixes msys2/msys2-autobuild#62, fixes #228 (and some other issues scattered about I don't remember off-hand)

Awaiting review on cygwin-patches mailing list: https://inbox.sourceware.org/cygwin-patches/2c68d6fe-5493-b7e0-6335-de5a68d3cd3f@jdrake.com/T/#u

lazka commented 6 days ago

-> https://github.com/msys2/MSYS2-packages/pull/5007

lazka commented 6 days ago

This sadly results in lots of errors being printed (w11 26100):

user@desktop MSYS ~
$ pacman -Su
      0 [waitproc] pacman 1159 proc_waiter: error on read of child wait pipe 0x58C, Win32 error 995
                                                                                                         0 [waitproc] pacman 1167 proc_waiter: error on read of child wait pipe 0x590, Win32 error 995
                                    0 [waitproc] pacman 1169 proc_waiter: error on read of child wait pipe 0x5C0, Win32 error 995
                                                                                                                                       0 [waitproc] pacman 1171 proc_waiter: error on read of child wait pipe 0x5E4, Win32 error 995
                                                                  0 [waitproc] pacman 1173 proc_waiter: error on read of child wait pipe 0x5F0, Win32 error 995
                                                                                                                                                                     0 [waitproc] pacman 1175 proc_waiter: error on read of child wait pipe 0x5E4, Win32 error 995
                                                                                                0 [waitproc] pacman 1177 proc_waiter: error on read of child wait pipe 0x5F4, Win32 error 995
                           0 [waitproc] pacman 1179 proc_waiter: error on read of child wait pipe 0x5EC, Win32 error 995
                                                                                                                              0 [waitproc] pacman 1181 proc_waiter: error on read of child wait pipe 0x5F4, Win32 error 995
                                                         0 [waitproc] pacman 1183 proc_waiter: error on read of child wait pipe 0xB0, Win32 error 995
                                                                                                                                                           0 [waitproc] pacman 1187 proc_waiter: error on read of child wait pipe 0x5EC, Win32 error 995
                                                                                      0 [waitproc] pacman 1189 proc_waiter: error on read of child wait pipe 0x5E4, Win32 error 995
                 0 [waitproc] pacman 1193 proc_waiter: error on read of child wait pipe 0x5F4, Win32 error 995
                                                                                                                    0 [waitproc] pacman 1195 proc_waiter: error on read of child wait pipe 0x5F4, Win32 error 995
                                         :: Starting core system upgrade...
 there is nothing to do
:: Starting full system upgrade...
 there is nothing to do
jeremyd2019 commented 6 days ago

D'oh. That error is expected due to CancelSynchronousIo

C:\>net helpmsg 995

The I/O operation has been aborted because of either a thread exit or an application request.

I wonder why I didn't see that when I tested this on ARM64

jeremyd2019 commented 5 days ago

I think that should have only shown up when strace is enabled.