on Windows, `waitForProcess` hangs forever on certain `git` commands

JakobBruenker commented 1 year ago

Encountered in this cabal issue: https://github.com/haskell/cabal/issues/8688

Running the following script will successfully execute the first process, but then will get stuck indefinitely while waiting for the second process to finish. The program cannot be killed with Ctr+C, but can be killed from the task manager.

If the process is killed and the git fetch command is run manually, the program will succeed if it's run again.

In WSL, the script ran successfully.

Reproduction steps:

Run the following script with cabal run <filename>:

{- cabal:
build-depends: base >= 4.16 && < 5
             , process == 1.6.16.0
-}
import Data.Foldable
import System.Process

processes :: [CreateProcess]
processes =
  [ CreateProcess {cmdspec = RawCommand "git" ["clone","--no-checkout","https://github.com/rtyley/small-test-repo"], cwd = Nothing, env = Nothing, std_in = Inherit, std_out = Inherit, std_err = Inherit, close_fds = False, create_group = False, delegate_ctlc = True, detach_console = False, create_new_console = False, new_session = False, child_group = Nothing, child_user = Nothing, use_process_jobs = True}
  , CreateProcess {cmdspec = RawCommand "git" ["fetch"], cwd = Just "./small-test-repo", env = Nothing, std_in = Inherit, std_out = Inherit, std_err = Inherit, close_fds = False, create_group = False, delegate_ctlc = True, detach_console = False, create_new_console = False, new_session = False, child_group = Nothing, child_user = Nothing, use_process_jobs = True}
  ]

main :: IO ()
main = do
  for_ processes $ \cp -> withCreateProcess cp $ \_ _ _ p -> do
    waitForProcess p >>= print

Environment:

Windows 11
ghc 9.4.4
git version 2.38.1.windows.1
process 1.6.16.0

snoyberg commented 1 year ago

Pinging @Mistuke if you have any thoughts on this.

Mistuke commented 1 year ago

Sure I can take a look. Just to check, is this msys2 git or native windows git?

I'm currently looking at a similar issue in cabal where it's looking like the finalizers aren't run, so the resource is never released and the child process becomes zombie.

I have been unable to reproduce it myself so hopefully I can with this.

JakobBruenker commented 1 year ago

@Mistuke I think native? But I'm not sure what the proper way to find out is

JakobBruenker commented 1 year ago

If you can't reproduce it and I can help in any other way, let me know

Mistuke commented 1 year ago

@Mistuke I think native? But I'm not sure what the proper way to find out is

Ah basically did you install it through pacman or the git website, or some other means?

JakobBruenker commented 1 year ago

@Mistuke I appear to have installed it via https://gitforwindows.org/

Mistuke commented 1 year ago

Ah great, that's native then. I'll dig in in a bit, will let you know if I need a system call trace. :)

Mistuke commented 1 year ago

Ok, I have been able to reproduce it, and I think that

main :: IO ()
main = do
  for_ processes $ \cp -> withCreateProcess cp $ \_ _ _ p -> do
    waitForProcess p >>= print

is... an interesting dichotomy.. when using use_process_jobs = True and calling waitForProcess inside withCreateProcess we wait while distinguishing between the process "exited" and the kernel having released all resources https://github.com/haskell/process/blob/v1.6.16.0/cbits/win32/runProcess.c#L620 The reason for this is that we want to know that the kernel has already flushed all the I/O data we might want to read back in. This avoid the racy conditions where the program hasn't flushed the file yet when we try to read it...

However the NT kernel only releases the process when the last HANDLE to it has been closed, but we're holding on to the last handle in p. This results in a race condition. If the Kernel has released the object, we're OK, if it hasn't, we'll wait indefinitely on https://github.com/haskell/process/blob/v1.6.16.0/cbits/win32/runProcess.c#L635 as we're waiting on ourselves.

However this use-case is not unreasonable, so question is how to support it. @snoyberg would you agree that the semantics of waitForProcess doesn't guarantee that the handle is still usable after the function returns?

That is to say, I think I'm allowed to close the process HANDLE in this call, as if you've waited for the process to terminate there can be no expectation that the handle is valid past this point.. Currently we close them when the finalizers run on GC or termination.

snoyberg commented 1 year ago

I don't think that's true in general. I'm not as familiar with the Windows process API (though I'm not a stranger to it), but it's a pretty common thing in my experience to, for example:

Start a process
Fork a background thread to read data from the process and do something with it
In the main thread, wait for the process to exit

Mistuke commented 1 year ago

That scenario should be fine though, what I'm asking is essentially, that after the process has exited, so after the main thread returns from waiting for the process exit, whether there's any expectation that the PID is still valid. I would have expected no since the program terminated. Of course you can still drain the data handles, but with the PID only getting the exit code seems valid.

i.e. pidfd_open is invalid after that point no?

snoyberg commented 1 year ago

Then I misunderstood, sorry. I believe what you're saying is correct. My understanding of how the package operates is that once it sees that a PID has closed, it never uses it again. This is based on the Unix contracts around waitforpid, but I think the same logic applies on the Windows side.

Mistuke commented 1 year ago

Sorry I have not forgotten about this. I have a fix for it but wanted to do some more verification in GHC before posting it. I've been busy with work but have set aside some time Saturday to finish checking and post the patch.

Mistuke commented 1 year ago

https://github.com/haskell/process/pull/277 Fixes this.

haskell / process

on Windows, `waitForProcess` hangs forever on certain `git` commands #273

Reproduction steps:

Environment: