Open bdf9f371-8532-42b3-8092-c371d8a759bf opened 15 years ago
According to http://docs.python.org/dev/library/signal.html , if I set up a signal handler in the main thread, and then have the signal delivered to the process, then the signal handler will be called in the main thread. The attached Python script I've written, however, doesn't work that way: sometimes the signal is completely lost, and the signal handler is not called.
Here is how it should work. The code has two threads: the main thread and the subthread. There is also a signal handler installed. The main thread is running select.select(), waiting for a filehandle to become readable. Then the subthread sends a signal to the process. The signal handler writes a byte to the pipe. The select wakes up raising 'Interrupted system call' because of the signal.
I'm running Ubuntu Hardy on x86_64. With Python 2.4.5 and Python 2.5.2, sometimes the signal handler is not called, and the select continues waiting indefinitely. This is what I get on stdout in Python 2.4.5:
main pid=8555 --- 0 A B S T U handler arg1=10 arg2=\<frame object at 0x79ab40> select got="(4, 'Interrupted system call')" read str='W' --- 1 A B S T U
This means that iteration 0 completed successfully: the signal handler got called, and the select raised 'Interrupted system call'. However, iteration 1 was stuck: the signal handler was never called, and the select waits indefinitely.
The script seems to work in Python 2.4.3, but it hangs in iteration about 60000.
I'm seeing something very similar to this. In my case, I have a single-threaded program, and select fails to be interrupted by SIGCHLD. I'm still tracking down more details, so I'll report back if I find more information.
Sorry for the noise. It turns out that my problem was unrelated.
I think two things can trigger this problem, both have to do with how signals are handled by the interpreter. Contrarily to what you may think, when a signal is received, its handler is _not_ called. Instead, it's Modules/signalmodule.c signal_handler() that's called. This handler stores the reception of the signal inside a table, and schedules the execution of the associated handler for later:
signal_handler(int sig_num)
{
[...]
Handlers[sig_num].tripped = 1;
/* Set is_tripped after setting .tripped, as it gets
cleared in PyErr_CheckSignals() before .tripped. */
is_tripped = 1;
Py_AddPendingCall(checksignals_witharg, NULL);
[...]
}
checksignal_withargs() calls PyErr_CheckSignals(), which in turn calls the handler. The pending calls are checked periodically from the interpreter main loop, in Python/ceval.c: when _Py_Ticker reaches 0, then we check for pending calls, and if there are any, we run the pending calls, hence checksignals_witharg, and the handler. This is actually a documented behaviour, quoting signal documentation: "Although Python signal handlers are called asynchronously as far as the Python user is concerned, they can only occur between the “atomic” instructions of the Python interpreter. This means that signals arriving during long calculations implemented purely in C (such as regular expression matches on large bodies of text) may be delayed for an arbitrary amount of time."
But there's a race, imagine this happens:
This problem can also happen even if the signal is sent after select is called:
But this case is quite flaky, because the documentation warns you: "Some care must be taken if both signals and threads are used in the same program. The fundamental thing to remember in using signals and threads simultaneously is: always perform signal() operations in the main thread of execution. Any thread can perform an alarm(), getsignal(), pause(), setitimer() or getitimer(); only the main thread can set a new signal handler, and the main thread will be the only one to receive signals (this is enforced by the Python signal module, even if the underlying thread implementation supports sending signals to individual threads). This means that signals can’t be used as a means of inter-thread communication. Use locks instead."
Sending signals to a process with multiple threads is risky, you should use locks.
Finally, I think that the documentation should be rephrased: "and the main thread will be the only one to receive signals (this is enforced by the Python signal module, even if the underlying thread implementation supports sending signals to individual threads)." It's false. What's guaranteed is that the signal handler will only be executed on behalf of the main thread, but any thread can _receive_ a signal. And comments in Modules/signalmodule.c are misleading: We still have the problem that in some implementations signals generated by the keyboard (e.g. SIGINT) are delivered to all threads (e.g. SGI), while in others (e.g. Solaris) such signals are delivered to one random thread (an intermediate possibility would be to deliver it to the main thread -- POSIX?). For now, we have a working implementation that works in all three cases -- the handler ignores signals if getpid() isn't the same as in the main thread. XXX This is a hack.
Sounds strange. If only a thread other than the main thread receives the signal and you ignore it, then it's lost, isn't it ? Furthermore, under Linux 2.6 and NPTL, getpid() returns the main thread PID even from another thread.
Peers ?
Thanks for the detailed analysis, Charles-François.
Finally, I think that the documentation should be rephrased:
Yes, I think so.
Furthermore, under Linux 2.6 and NPTL, getpid() returns the main thread PID even from another thread.
Yes, those threads belong to the same process.
But as mentioned, signals are a rather fragile inter-process communication device; just use a specific file descriptor. And if you still wanna use signals, there's set_wakeup_fd(): http://docs.python.org/library/signal.html#signal.set_wakeup_fd
Agree with Charles-François's second explanation. This makes it very hard to reliably handle signals -- basically everyone has to remember to use set_wakeup_fd, and most people don't. For example, gunicorn is likely vulnerable to this because it doesn't use set_wakeup_fd. I suspect most code using select + signals is wrong.
I've attached a patch which fixes the issue for select(), but not any other functions. If it's considered a good patch, I can work on the rest of the functions in the select module. (Also, tests for the details of the behavior.)
Also the patch is pretty hokey, so I'd appreciate feedback if it's going to go in. :)
Adding haypo since apparently he's been touching signals stuff a lot lately, maybe has some useful thoughts / review? :)
This was turned into a doc issue, with no patch forthcoming, but Devin has submitted a bugfix. Should this be turned back into a bug issue?
A workaround to handle signals reliably that I successfully tested now is to execute everything within a subthread and let the main thread just join this subthread. Like:
signal.signal(MY_SIGNAL, signal_handler)
threading.Thread(target = my_main_function)
thread.start()
thread.join()
Doing it like this, the main thread should always listen to signals disregarding whether the subthread is stuck.
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields: ```python assignee = None closed_at = None created_at =
labels = ['interpreter-core', 'type-bug', 'library', 'docs']
title = 'signal handler never gets called'
updated_at =
user = 'https://bugs.python.org/pts'
```
bugs.python.org fields:
```python
activity =
actor = 'Patrick Fink'
assignee = 'docs@python'
closed = False
closed_date = None
closer = None
components = ['Documentation', 'Interpreter Core', 'Library (Lib)']
creation =
creator = 'pts'
dependencies = []
files = ['13138', '39489']
hgrepos = []
issue_num = 5315
keywords = ['patch']
message_count = 9.0
messages = ['82472', '100306', '100309', '102829', '102850', '244015', '244019', '246963', '314669']
nosy_count = 12.0
nosy_names = ['tim.peters', 'georg.brandl', 'terry.reedy', 'pitrou', 'amcnabb', 'vstinner', 'Devin Jeanpierre', 'pts', 's7v7nislands', 'neologix', 'Netto', 'Patrick Fink']
pr_nums = []
priority = 'normal'
resolution = None
stage = 'patch review'
status = 'open'
superseder = None
type = 'behavior'
url = 'https://bugs.python.org/issue5315'
versions = ['Python 2.7', 'Python 3.4', 'Python 3.5', 'Python 3.6']
```