Open weissi opened 1 year ago
process_bug_repro.swift
( in zip for GitHub) process_bug_repro.zipimport Foundation
import Dispatch
func makePSLoop(interestingPids: [CInt]) -> DispatchSourceTimer {
let q = DispatchQueue(label: "offload")
let timer = DispatchSource.makeTimerSource(queue: q)
timer.setEventHandler {
let p = Process()
p.executableURL = URL(fileURLWithPath: "/bin/ps")
let args = ["uw"] + interestingPids.flatMap { ["-p", "\($0)" ] }
print("[in parent: \(getpid())] WEIRD (THIS IS THE BUG), still waiting at \(Date()). Running ps \(args.joined(separator: " "))")
p.arguments = args
try? p.run()
p.waitUntilExit()
}
timer.schedule(deadline: .now() + 5, repeating: 5)
return timer
}
let p = Process()
p.executableURL = URL(fileURLWithPath: "/bin/sh")
p.arguments = [
"-c",
"""
echo "[in child: $$] start subprocess 'childs child'"
/bin/sh -c 'echo "[in childs child: $$] start"; sleep 12345678; echo "[in childs child: $$] done"' &
child_child_pid=$!
echo "[in child: $$] waiting for childs child (with pid $child_child_pid)"
wait
echo "[in child: $$] done"
"""
]
print("[in parent: \(getpid())] start subprocess 'child'")
fflush(stdout)
try p.run()
print("[in parent: \(getpid())] waiting 1 second (for child with pid \(p.processIdentifier))")
fflush(stdout)
sleep(1)
print("[in parent: \(getpid())] kill SIGKILL child with pid \(p.processIdentifier))")
let err = kill(p.processIdentifier, SIGKILL)
print("[in parent: \(getpid())] kill \(err == 0 ? "successful" : "failed (\(errno))")")
print("[in parent: \(getpid())] waiting for child with pid \(p.processIdentifier) to exit")
fflush(stdout)
let printPSLoop = makePSLoop(interestingPids: [getpid(), p.processIdentifier, p.processIdentifier + 1])
printPSLoop.resume()
p.waitUntilExit()
print("[in parent: \(getpid())] done")
fflush(stdout)
printPSLoop.cancel()
Still happens in 6.0 with swift-foundation
.
Description
Foundation.Process
on Linux uses a trick (that doesn't actually work...) to detect if the child process has exited: It inherits a socketpair descriptor into the child and it expects this socket to be closed when the child exits. In simple scenarios that is true but UNIX by default inherits all file descriptors into child processes. That means if the sub process itself spawns another process, the special socket will be inherited into the child.That's a huge issue however because now the parent process will no longer detect if the child is dying because the child's child also has that file descriptor...
Attached, please find a reproduction which does the following:
The
parent
process spawns a/bin/sh
as itschild
process. Thatchild
process spawns another process (childs child
) which doessleep 12345678
which is a very very long sleep. After one second,parent
killschild
withSIGKILL
which means thatchild
now immediately exits. Then, theparent
callsprocess.waitUntilExit()
which should immediately return (because the child is dead). Alas,Foundation.Process
does not realise thatchild
is dead because that special socketpair is also inherited intochilds child
(and further sub processes)...Expected behaviour (observed on Darwin)
Actual behaviour (observed on Linux, Swift 5.8)
Fix
Instead of using this special socketpair which has two issues:
To fix both of these, Foundation.Process should either use
pidfd_open
orsignalfd
onSIGCHLD
to get anepoll
able signal when the child process dies.