Closed JakobBruenker closed 1 year ago
Pinging @Mistuke if you have any thoughts on this.
Sure I can take a look. Just to check, is this msys2 git or native windows git?
I'm currently looking at a similar issue in cabal where it's looking like the finalizers aren't run, so the resource is never released and the child process becomes zombie.
I have been unable to reproduce it myself so hopefully I can with this.
@Mistuke I think native? But I'm not sure what the proper way to find out is
If you can't reproduce it and I can help in any other way, let me know
@Mistuke I think native? But I'm not sure what the proper way to find out is
Ah basically did you install it through pacman
or the git website, or some other means?
@Mistuke I appear to have installed it via https://gitforwindows.org/
Ah great, that's native then. I'll dig in in a bit, will let you know if I need a system call trace. :)
Ok, I have been able to reproduce it, and I think that
main :: IO ()
main = do
for_ processes $ \cp -> withCreateProcess cp $ \_ _ _ p -> do
waitForProcess p >>= print
is... an interesting dichotomy.. when using use_process_jobs = True
and calling waitForProcess
inside withCreateProcess
we wait while distinguishing between the process "exited" and the kernel having released all resources https://github.com/haskell/process/blob/v1.6.16.0/cbits/win32/runProcess.c#L620 The reason for this is that we want to know that the kernel has already flushed all the I/O data we might want to read back in. This avoid the racy conditions where the program hasn't flushed the file yet when we try to read it...
However the NT kernel only releases the process when the last HANDLE
to it has been closed, but we're holding on to the last handle in p
. This results in a race condition. If the Kernel has released the object, we're OK, if it hasn't, we'll wait indefinitely on https://github.com/haskell/process/blob/v1.6.16.0/cbits/win32/runProcess.c#L635 as we're waiting on ourselves.
However this use-case is not unreasonable, so question is how to support it. @snoyberg would you agree that the semantics of waitForProcess
doesn't guarantee that the handle
is still usable after the function returns?
That is to say, I think I'm allowed to close the process HANDLE
in this call, as if you've waited for the process to terminate there can be no expectation that the handle is valid past this point.. Currently we close them when the finalizers run on GC or termination.
I don't think that's true in general. I'm not as familiar with the Windows process API (though I'm not a stranger to it), but it's a pretty common thing in my experience to, for example:
That scenario should be fine though, what I'm asking is essentially, that after the process has exited, so after the main thread returns from waiting for the process exit, whether there's any expectation that the PID is still valid. I would have expected no since the program terminated. Of course you can still drain the data handles, but with the PID only getting the exit code seems valid.
i.e. pidfd_open
is invalid after that point no?
Then I misunderstood, sorry. I believe what you're saying is correct. My understanding of how the package operates is that once it sees that a PID has closed, it never uses it again. This is based on the Unix contracts around waitforpid, but I think the same logic applies on the Windows side.
Sorry I have not forgotten about this. I have a fix for it but wanted to do some more verification in GHC before posting it. I've been busy with work but have set aside some time Saturday to finish checking and post the patch.
https://github.com/haskell/process/pull/277 Fixes this.
Encountered in this cabal issue: https://github.com/haskell/cabal/issues/8688
Running the following script will successfully execute the first process, but then will get stuck indefinitely while waiting for the second process to finish. The program cannot be killed with Ctr+C, but can be killed from the task manager.
If the process is killed and the
git fetch
command is run manually, the program will succeed if it's run again.In WSL, the script ran successfully.
Reproduction steps:
Run the following script with
cabal run <filename>
:Environment: