Closed nsoblath closed 9 years ago
Here's an example that works:
^C2015/06/03 16:33:08 [hornet] termination requested...
2015/06/03 16:33:08 [hornet] stopping 11 threads
2015/06/03 16:33:08 [worker] stopping on interrupt.
2015/06/03 16:33:08 [worker 1.1] no work remaining. total of 0 jobs processed.
2015/06/03 16:33:08 [worker] stopping on interrupt.
2015/06/03 16:33:08 [worker 2.1] no work remaining. total of 0 jobs processed.
2015/06/03 16:33:08 [worker] stopping on interrupt.
2015/06/03 16:33:08 [worker 3.1] no work remaining. total of 0 jobs processed.
2015/06/03 16:33:08 [worker] stopping on interrupt.
2015/06/03 16:33:08 [worker 4.1] no work remaining. total of 0 jobs processed.
2015/06/03 16:33:08 [classifier] stopping on interrupt.
2015/06/03 16:33:08 [classifier] finished.
2015/06/03 16:33:08 [amqp sender] stopping on interrupt.
2015/06/03 16:33:08 [amqp sender] finished.
2015/06/03 16:33:08 [watcher] stopping on interrupt.
2015/06/03 16:33:08 [mover] stopping on interrupt.
2015/06/03 16:33:08 [mover] finished.
2015/06/03 16:33:08 [worker] stopping on interrupt.
2015/06/03 16:33:08 [worker 0.2] no work remaining. total of 1 jobs processed.
2015/06/03 16:33:08 [shipper] stopping on interrupt.
2015/06/03 16:33:08 [shipper] finished.
2015/06/03 16:33:08 [scheduler] stopping on interrupt
2015/06/03 16:33:08 [amqp receiver] stopping on interrupt.
2015/06/03 16:33:08 [amqp receiver] finished.
2015/06/03 16:33:08 [hornet] All goroutines finished. terminating...
It seems to be due to whether or not the SIGINT interrupts the inotify system call, or whether it's caught by hornet. In the former case, hornet doesn't exit. In the latter case, it's fine.
It looks like the defer statements for the watcher are never called. I don't know why that is. The behavior is the same whether the function returns from the case block, or whether the runLoop is broken.
Fixed in commit b5d984f.
The problem turned out to be the sending of the StopExecution to the controlQueue. In cases where one of the threads had quit already, this was a blocking call since there was one thread not present to receive from the connection. To fix this I used the select-w/-default idiom to avoid hanging when one or more of the threads is missing.
I also added a timeout to the pool.Wait call, in case a thread for some reason doesn't quit or doesn't quit correctly, and isn't subtracted from the WaitGroup.
All threads appear to have stopped: