Add signal possibility on unix

faern commented 7 years ago

Closes #12

A proposal for how to solve #12.

Adds the convenience method terminate available everywhere
Adds a trait on unix platforms that allows sending of arbitrary signals to the child.

oconnor663 commented 7 years ago

Thanks for doing all this work! Some design thoughts. I have a few worries about exposing a terminate method:

It won't be super clear whether people should use kill or terminate in the most common case. (Though there might actually be some debate about what the right answer is here >.< )
The behavior of terminate is hard to describe to folks not familiar with signals, because there are so many exceptions. In particular, it can't guarantee that the child will actually exit in a reasonable amount of time. I think the correct way to use terminate in most cases is to call it, give the child a second to quit, and then kill it if it hasn't quit by then. But I expect the vast majority of callers will actually use terminate by itself, even when they shouldn't.
Similarly, the biggest appeal of terminate is that it gives delicate apps a chance to clean themselves up and save state before exiting. But even that isn't guaranteed, because of Windows. If we expect people to reach for terminate specifically when they care about cleanup, this might be a portability footgun.

So I'm leaning towards only providing the signal function, and (as you have it now) only on Unix. It covers all the same use cases, but it makes the caveats above much more explicit. What do you think?

Side question, what are the benefits of using an extension trait vs just a #[cfg(not(windows))] gated method? I know the stdlib only uses traits, but I don't know why.

faern commented 7 years ago

I agree with your concern about confusion over terminate. It would make much more sense if there was a cleaner way to terminate processes on Windows as well. From what I have heard, one can send a ctrl+c equivalent event to a process and hope it responds to that. What do you think about such a solution? There is no guarantee that a process will honor that, but the same holds for SIGTERM on unix so...

Yes, depending on the use case, one probably want to call terminate, have a timeout, then call kill if the process is still running.

I can remove the terminate method if we don't feel it's justified at the moment, just adding the possibility to signal on unix gives the same power to the user, but with some extra code on their side.

About Ext trait vs conditional method: In stdlib all those extension traits have paths like std::something::unix::FooExt making it clear they are for unix only. Also in the docs the platform specific functions will be grouped together at the bottom under the trait impl section. Personally I feel it would be much more confusing if they suddenly disappeared from the normal list of methods on some platforms. With that solution one would have to clearly write in the docs for the method that it only exists on unix. With this solution you would get an import error when trying to import the trait on Windows.

faern commented 7 years ago

To enable the terminate -> timeout -> kill workflow it might be useful to expose a wait_timeout method as well. I'm not sure if there is a system call that allows that to be implemented in a non-polling manner on unix. The subprocess crate seems to use polling on unix and win32::WaitForSingleObject on Windows.

A combination of these could be to expose a terminate(Duration) that does the above, but all built in. The state after calling that would be guaranteed to be the same as after a call to kill, only difference is that the process would be given a duration for graceful exit first.

faern commented 7 years ago

We could create another SharedChildExt in the windows module as well that would expose a generate_console_ctrl_event(CtrlEvent) that would make use of kernel32::GenerateConsoleCtrlEvent to send either ctrl+c or ctrl+break.

With these two ext traits we could have the common terminate (that would use either ctrl+c or SIGTERM). Thoughts on this?

EDIT: Ok, that was not as straight forward as I had hoped. Sending those events is messy.

oconnor663 commented 7 years ago

In stdlib all those extension traits have paths like std::something::unix::FooExt making it clear they are for unix only.

Got it, that makes sense.

kernel32::GenerateConsoleCtrlEvent

Very interesting, I didn't know about that one. Do you know whether there are any gotchas? (Like, if I create a simple GUI app with all the defaults in Visual Studio or whatever, is it going to exit when it gets the console event? Or does it have to add special handling or be in console mode or anything like that?)

We could also go with a signals on Unix only + ctrl event on Windows only approach, especially if the behavior of the two was different enough to be confusing. Though I think child handling is a little easier on Windows in general, because you don't have the same race conditions, so we wouldn't necessarily be adding much there besides a safe wrapper for the win32 functions.

To enable the terminate -> timeout -> kill workflow it might be useful to expose a wait_timeout method as well.

I looked into this sort of thing in the very beginning, before I knew about libc::waitid at all, as another way to avoid the PID race. As far as I can tell, there's no native support on Unix for anything in between "block forever" and "don't block at all", and everyone implements this by looping with sleeps. It's kind of sad that SIGCHLD exists and solves this problem, but library code can't use it. (Windows does support this timeout though. Honestly I respect the Windows kernel a lot more now having done this project.)

This sleep loop runs into one of those weird-edge-cases-that-make-me-feel-icky though: You might expect wait_timeout(huge_duration) to be equivalent to wait(), but it isn't. They're similar at first, but after time passes and the sleeps get longer, there can be a big delay between when your child process exits and when you actually wake up to check it again. A lot of callers probably don't care about that difference, but if your child process is serving requests or something, it might be really bad to leave it dead for ten seconds before you notice it crashed.

So kind of like with terminate, I feel a little skittish about exposing wait_timeout as a simple-and-correct-sounding method. Especially since the most common use case is probably better implemented by what this library already supports: wait in one thread and then sleep once and kill in another thread.

faern commented 7 years ago

No I don't have any more info or experience with GenerateConsoleCtrlEvent. After reading the different answers on https://stackoverflow.com/questions/813086/can-i-send-a-ctrl-c-sigint-to-an-application-on-windows I feel it's very messy. One has to attach the console of another process to our process and then generate the event etc. I know very little about process management in Windows and I don't have a very good environment for testing it.

How about this PR focuses on stabilizing the unix signal extension only and merge that. Then we can start experimenting on Windows and see what we come up with.

faern commented 7 years ago

Btw. Any reason the library uses #[cfg(not(windows))] instead of #[cfg(unix)]?

#[cfg(unix)]
pub mod unix;

Would feel more natural. But I did not want to introduce that without asking, since you did not use it before.

oconnor663 commented 7 years ago

The only reason is that I didn't know what would happen on weirdo platforms like Redox, and I figured we might as well say "if it isn't Windows, try the Unix thing". Is there any downside to doing it that way?

oconnor663 commented 7 years ago

This looks great. Just tried it out by hand. Merging now.

oconnor663 / shared_child.rs

Add signal possibility on unix #13