transientskp / aartfaac-control

AARTFAAC control scripts
0 stars 0 forks source link

Better detection of crashed processes and possible restart #29

Closed hsuyeep closed 8 years ago

hsuyeep commented 9 years ago

The control system currently does not monitor the state of the processes it starts. Processes are addressed only when a new observation needs to start, in which case the older processes are given a 0 STOP command before being given a 0 START command. There should be better reliability, possibly via a process monitoring system.

One option is that cmdclients probe the processes they control and maintain a status which acontrol can periodically query via a heartbeat. On a missed heartbeat, acontrol can check if the observation is ongoing, and can reinitiate the processes.

Another option is that a process can generate a signal to the cmdclient when it dies, and this can be forwarded to the control system by cmdclient. The control system can then take appropriate action to restart the process etc.