treasure-data / serverengine

A framework to implement robust multiprocess servers like Unicorn
Apache License 2.0
759 stars 86 forks source link

Removing heartbeat pipe from ProcessManager #65

Open frsyuki opened 8 years ago

frsyuki commented 8 years ago

ProcessManager sends heartbeat messages to child processes so that it can kill a child process when the process doesn't respond due to too high GC overhead, deadlock, etc. It was relatively easy to keep the code maintained but because ServerEngine started supporting Windows since #50, having heartbeat doesn't seem worth to make the code a lot more complicated.

Idea here is to remove heartbeat support from ProcessManager. ProcessManager doesn't create a set of pipes to child processes. Thus it doesn't need Windows-specific code (ServerEngine.windows? and Fcntl) any more.

A downside is that the parent process can't notice finish (or crash) of child processes using IO.select. Instead, the parent process needs to use waitpid2 for each process. This means that the parent process takes slightly longer time to notice and restart a child process, or needs another synchronization mechanism (e.g. Thread per child process).

Tensho commented 6 years ago

I guess this is relevant vice versa – when the parent process dies, worker processes should be notified about it. For example, in Unicorn and Puma there is EOF sign appears in the pipe between parent and child processes, that OS triggers during the parent process killing. I don't know much about Windows inter-process communication support, but as far as the need for communication exists in one direction (child –> parent ) it's reasonable to think about another direction (parent –> child) too.

I'm concerned about the orphaned children in case of the parent death. It's critical for non-supervised (supervisor: false) server, like in sneakers. Monitoring tool is able to spy only on the parent process due to one pidfile and launches new server process with a new set of worker processes without caring about orphaned.

Cross Reference: "Pidfiles per Sneakers Process"

Tensho commented 6 years ago

@frsyuki, Does it make sense?