Comparison with ClonableChild crate

faern commented 7 years ago

Hi,

Just saw this crate, and notice that it is virtually doing the same as my clonablechild crate. Just wanting to check in if there is any difference that I miss?

Since it's a very isolated and small problem it should more or less only be one correct solution I think. Thus we could maybe combine forces? If your library does stuff better I will remove mine.

I also put the standard Child in a Mutex. But where you seem to use a special procedure to wait I use the built in wait. And where you use the built in kill method I have my own kill implementation for when the Mutex is already taken.

I probably have the reused pid race condition you talk about in your README.

oconnor663 commented 7 years ago

Oh cool! Agreed that they're very similar use cases. Some scattered thoughts along the lines you mentioned:

I think you do have the wait-vs-kill race that shared_child talks about in its README, the same one that motivated the standard lib to put &mut on those methods. I think you either have to call libc::waitid or handle SIGCHLD to work around that race. But it's a pretty wacky race condition, so I could see some projects being ok with the idea that it'll probably never happen. (I think it's only really likely if you've got ~32k processes running at the same time, so that any free PID is likely to get reused immediately. But then you probably have bigger problems with not being able to spawn children at all :p)
That said, I think you have a bigger PID reuse issue. It looks like your kill function can send a kill signal even if the process was waited on a long time ago. In the branch where it doesn't get the mutex, it looks like it doesn't check whether the process is still alive. (In the branch where it calls Child::kill, that method will check for you though, as long as you keep using Child::wait to do the waiting.) That's much more likely to go wrong than the race condition above, though I think you can fix it by just setting a flag after wait succeeds, and checking it before you kill. (shared_child keeps the Exited state for this reason.)
- This one's probably more likely to cause a "PID doesn't exist" error. I think on Windows you're going to get a "handle has been closed" error or something like that, because you store the child's handle by value rather than calling DuplicateHandle, and I think the original is going to get closed after wait succeeds, kind of like a file descriptor would?
A similar issue looks like it can come up if you call kill twice from two different threads at the same time. The second kill might fail to acquire the lock (because the first kill is holding it for a split second), and assume it conflicting with a wait, when in fact it's not. Sending two signals in parallel might be ok, but it seems like it's probably not what you meant to happen. shared_child decided to use Condvar instead of try_lock, to avoid precisely this sort of issue where a simultaneous kill is indistinguishable from a blocking wait.
I don't think you're distinguishing between the WouldBlock case and the Poisoned case when you call try_lock. The latter should probably result in a panic?
It looks like your stdin/stdout/stderr methods take &mut self, but I don't think they need to?

oconnor663 commented 7 years ago

Actually I'm not sure setting a "this child has already exited flag" will be good enough in your API. I could call new with a child that's already been waited on, and the CloneableChild would then be in the wrong state. Triggering the custom kill path (like by calling kill on two threads at once) would then always try to kill the PID, even if there was normally bookkeeping in place to prevent that.

I just shipped some changes to always use Child::wait and Child::kill under the hood, to avoid a similar issue after the child is returned from my into_inner. (Though I still avoid accepting children from the caller -- instead I spawn them myself.)

faern commented 7 years ago

Thanks for the detailed explanation. At first I thought about patching up clonablechild as well, but I realized that was not the best solution. The best IMO would be to not even have two crates solve the same, very small, problem. So I yanked all versions of clonablechild and added a deprecation notice in the README.

Just out of curiosity, you never saw clonablechild before publishing this, or what made you create a new crate instead of improving the existing one?

oconnor663 commented 7 years ago

Sadly I never saw it. There was some discussion about the problem on the try_wait tracking issue, and I guess I assumed that if anyone on that thread knew about an existing crate they would've mentioned it there.

oconnor663 commented 7 years ago

Will close for now, but let me know if anything changes. I'd be happy to work together to port fixes.

oconnor663 / shared_child.rs

Comparison with ClonableChild crate #9