pgiri / dispy

Distributed and Parallel Computing Framework with / for Python
https://dispy.org
Other
266 stars 55 forks source link

dispy-4.10 dies from SIGHUP, SIGINT #154

Open UnitedMarsupials-zz opened 6 years ago

UnitedMarsupials-zz commented 6 years ago

I ensure, dispynode is running on the plethora of nodes using Ansible. Unfortunately, Ansible has a nasty habit of killing any processes started during the run -- see ansible/ansible#33410.

With 4.9.1 I was able to work-around this by using nohup:

        nohup $exec --daemon --debug < /dev/null >> $logfile 2>&1 &

Unfortunately, the trick no longer works with 4.10 and the dispynode process dies as soon Ansible finishes running:

2018-10-03 10:16:29 dispynode - dispynode version: 4.10.0, PID: 21247
2018-10-03 10:16:29 pycos - version 4.8.3 with epoll I/O notifier
2018-10-03 10:16:30 dispynode - "r01001n00" serving 48 cpus
2018-10-03 10:16:30 dispynode - TCP server at 10.109.51.131:51348
2018-10-03 10:16:30 dispynode - dispynode received signal 1
2018-10-03 10:16:31 dispynode - dispynode received signal 2
2018-10-03 10:16:31 pycos - terminating task !timer_proc/54383752 (daemon)
2018-10-03 10:16:31 pycos - terminating task !tcp_server/54396088 (daemon)
2018-10-03 10:16:31 pycos - terminating task !udp_server/54404120 (daemon)
2018-10-03 10:16:31 pycos - pycos terminated

Ideally, the use of --daemon would cause the program to properly disconnect itself from the tty -- doing, what daemon(3) does on Linux and BSD -- and become a proper daemon.

Less ideally, some way of telling it to ignore certain signals would be needed... I had to revert back to 4.9.1 because of this problem :(

pgiri commented 6 years ago

Sorry about that. In 4.10.0 I added signal handling for SIGQUIT and SIGHUP. The fix just committed doesn't handle those signals if dispynode is started with --daemon option.