Do not hang in communicate if spawned process forks

stripe / subprocess

A port of Python's subprocess module to Ruby

MIT License

208 stars 17 forks source link

Do not hang in communicate if spawned process forks #39

Closed andrew-stripe closed 7 years ago

andrew-stripe commented 7 years ago

Currently we only exit the communicate loop when we get an EOF from stdout and stderr. If the child process forks before exiting, the fork will inherit those file descriptors and communicate will hang until the child exits. I believe it's preferable to drain stdout/stderr after the process exits and then break out of the communicate loop.

Note that Open3 similarly hangs waiting for EOF. There is a risk that existing software depends on this behavior and that output could be non-deterministic if the forked grandchild is writing to stdout/stderr. Do you think that's acceptable?

r? @nelhage-stripe

andrew-stripe commented 7 years ago

My other major concern with this PR is that it could re-introduce a race like the one I fixed in #23.

nelhage-stripe commented 7 years ago

I'm slightly FUDful of this change since I think the old behavior is a relatively standard choice for these libraries, and even though it's a bit of a sharp edge, there's some merit in consistency…

I'll give it a bit of thought, though.

andrew-stripe commented 7 years ago

I suppose in my immediate use case I can work around this by checking process.status in the block I pass to communicate. A bit more implementation dependent than I like but maybe fine (assumes that we never yield to the block after calling poll but before draining stdout/stderr).

andrew-stripe commented 7 years ago

Hmm, that hack only works if something is read from stdout/stderr when self_read is read. I could update communicate to yield to the block any time ready_r is non-empty instead.

nelhage-stripe commented 7 years ago

OOC, is it possible for your use case for the forked child to close these FDs? That's certainly the "cleanest" solution in some sense.

andrew-stripe commented 7 years ago

Here's a minimal repro of the problem I'm running into with ssh:

host = <YOUR HOST>
require 'subprocess'

Dir.mktmpdir do |tmp|
  ctrl = File.join(tmp, 'ctrl')
  puts "Kill mux with: ssh -S #{ctrl} -O exit #{host}"
  cmd = ['ssh']
  cmd += %W{-o ControlMaster=auto -o ControlPersist=1m -o ControlPath=#{ctrl}}
  cmd << host
  cmd += %w{echo foo}
  Subprocess.popen(cmd, stdout: Subprocess::PIPE, stderr: Subprocess::PIPE) do |pr|
    p pr.communicate
  end
end

This hangs until you kill the forked mux process. Interestingly it only hangs if you set both stdout and stderr to Subprocess::PIPE.

nelhage-stripe commented 7 years ago

Ugh. This really feels like an ssh bug to me, and in fact, it's been fixed upstream in SSH: https://bugzilla.mindrot.org/show_bug.cgi?id=1988

But that's not useful to solving your problem, and I think on reflection I'm ok with changing the communicate contract to be "wait until the process exits" not "until the process exits AND EOF on all pipes"

I'll stare harder at this patch and see if I spot any races.

nelhage-stripe commented 7 years ago

I guess one concern is that this could regress if we were accidentally relying on the old behavior somewhere; e.g. a process that forks a child and exits, and then the child prints something and exits. That seems unlikely, but not impossible.

I think I'm provisionally :+1: on this, though.

lgtm