saleyn / erlexec

Execute and control OS processes from Erlang/OTP
https://hexdocs.pm/erlexec/readme.html
Other
541 stars 141 forks source link

Fix for broken stop_child #12

Closed davidw closed 11 years ago

davidw commented 11 years ago

There are three important bits here:

davidw commented 11 years ago

Hrm... pull request has improved things, but not all is right. When I do application:stop(exec) it does not clean up properly.

davidw commented 11 years ago

Thinking out load: terminate() should wait on a message from the c++ app...

davidw commented 11 years ago

https://github.com/davidw/erlexec/tree/application_stop_fix

This branch has a few more fixes that seem to stop things cleanly when we do application:stop(exec). One bit that I was unsure of is where I just put a sleep(1) - maybe that's not the right way of doing things, but it seems to work for me.

Here's the motivation behind a lot of this work:

I'm managing some external processes, and init:stop gets called, which in turn stops exec, but some of the applications that get stopped catch SIGTERM and take a few seconds to shut down (or have issues and require SIGKILL). When those processes are shutting down, they'd still like to log stuff which they do via Erlang (they're really open_ports that are managed by exec). So it'd be nice if Erlang and the logger application would wait until exec has finished shutting down, which this code seems to accomplish.

I will be testing it more on Monday...

saleyn commented 11 years ago

This pull request looks good except for fprintf() calls which are terminated by "\n" rather than "\r\n". The later is needed so that when the port program prints debug info it shows up properly in console window (otherwise new lines are continued without carriage return on some platforms).

saleyn commented 11 years ago

Also regarding the application_stop_fix branch - the unconditional use of sleep(1) is a problem in case when there is a stuck process that can't be killed for some reason. Say you called a manage(OsPid) on a pid that has different user's ownership, and can't be killed. In that case the port program would get stuck. Prior solution relied on a deadline given to exec to terminate. I believe the original approach is better. Perhaps the default deadline needs to be increased?

davidw commented 11 years ago

Ok, this one is fixed up. I will open a separate pull request for the other branch so we can hash out the details there.

saleyn commented 11 years ago

Thanks for your contribution!