python / cpython

The Python programming language
https://www.python.org
Other
63.36k stars 30.34k forks source link

signal handler never gets called #49565

Open bdf9f371-8532-42b3-8092-c371d8a759bf opened 15 years ago

bdf9f371-8532-42b3-8092-c371d8a759bf commented 15 years ago
BPO 5315
Nosy @tim-one, @birkenfeld, @terryjreedy, @pitrou, @vstinner, @ssbr
Files
  • tsig.py: Python script which triggers the strange behavior
  • select_select.diff
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields: ```python assignee = None closed_at = None created_at = labels = ['interpreter-core', 'type-bug', 'library', 'docs'] title = 'signal handler never gets called' updated_at = user = 'https://bugs.python.org/pts' ``` bugs.python.org fields: ```python activity = actor = 'Patrick Fink' assignee = 'docs@python' closed = False closed_date = None closer = None components = ['Documentation', 'Interpreter Core', 'Library (Lib)'] creation = creator = 'pts' dependencies = [] files = ['13138', '39489'] hgrepos = [] issue_num = 5315 keywords = ['patch'] message_count = 9.0 messages = ['82472', '100306', '100309', '102829', '102850', '244015', '244019', '246963', '314669'] nosy_count = 12.0 nosy_names = ['tim.peters', 'georg.brandl', 'terry.reedy', 'pitrou', 'amcnabb', 'vstinner', 'Devin Jeanpierre', 'pts', 's7v7nislands', 'neologix', 'Netto', 'Patrick Fink'] pr_nums = [] priority = 'normal' resolution = None stage = 'patch review' status = 'open' superseder = None type = 'behavior' url = 'https://bugs.python.org/issue5315' versions = ['Python 2.7', 'Python 3.4', 'Python 3.5', 'Python 3.6'] ```

    bdf9f371-8532-42b3-8092-c371d8a759bf commented 15 years ago

    According to http://docs.python.org/dev/library/signal.html , if I set up a signal handler in the main thread, and then have the signal delivered to the process, then the signal handler will be called in the main thread. The attached Python script I've written, however, doesn't work that way: sometimes the signal is completely lost, and the signal handler is not called.

    Here is how it should work. The code has two threads: the main thread and the subthread. There is also a signal handler installed. The main thread is running select.select(), waiting for a filehandle to become readable. Then the subthread sends a signal to the process. The signal handler writes a byte to the pipe. The select wakes up raising 'Interrupted system call' because of the signal.

    I'm running Ubuntu Hardy on x86_64. With Python 2.4.5 and Python 2.5.2, sometimes the signal handler is not called, and the select continues waiting indefinitely. This is what I get on stdout in Python 2.4.5:

    main pid=8555 --- 0 A B S T U handler arg1=10 arg2=\<frame object at 0x79ab40> select got="(4, 'Interrupted system call')" read str='W' --- 1 A B S T U

    This means that iteration 0 completed successfully: the signal handler got called, and the select raised 'Interrupted system call'. However, iteration 1 was stuck: the signal handler was never called, and the select waits indefinitely.

    The script seems to work in Python 2.4.3, but it hangs in iteration about 60000.

    b56278f9-0197-41e2-a7f9-2e9975d7873d commented 14 years ago

    I'm seeing something very similar to this. In my case, I have a single-threaded program, and select fails to be interrupted by SIGCHLD. I'm still tracking down more details, so I'll report back if I find more information.

    b56278f9-0197-41e2-a7f9-2e9975d7873d commented 14 years ago

    Sorry for the noise. It turns out that my problem was unrelated.

    79528080-9d85-4d18-8a2a-8b1f07640dd7 commented 14 years ago

    I think two things can trigger this problem, both have to do with how signals are handled by the interpreter. Contrarily to what you may think, when a signal is received, its handler is _not_ called. Instead, it's Modules/signalmodule.c signal_handler() that's called. This handler stores the reception of the signal inside a table, and schedules the execution of the associated handler for later:

    signal_handler(int sig_num)
    {
    [...]
                    Handlers[sig_num].tripped = 1;
                    /* Set is_tripped after setting .tripped, as it gets
                       cleared in PyErr_CheckSignals() before .tripped. */
                    is_tripped = 1;
                    Py_AddPendingCall(checksignals_witharg, NULL);
    [...]
    }

    checksignal_withargs() calls PyErr_CheckSignals(), which in turn calls the handler. The pending calls are checked periodically from the interpreter main loop, in Python/ceval.c: when _Py_Ticker reaches 0, then we check for pending calls, and if there are any, we run the pending calls, hence checksignals_witharg, and the handler. This is actually a documented behaviour, quoting signal documentation: "Although Python signal handlers are called asynchronously as far as the Python user is concerned, they can only occur between the “atomic” instructions of the Python interpreter. This means that signals arriving during long calculations implemented purely in C (such as regular expression matches on large bodies of text) may be delayed for an arbitrary amount of time."

    But there's a race, imagine this happens:

    This problem can also happen even if the signal is sent after select is called:

    But this case is quite flaky, because the documentation warns you: "Some care must be taken if both signals and threads are used in the same program. The fundamental thing to remember in using signals and threads simultaneously is: always perform signal() operations in the main thread of execution. Any thread can perform an alarm(), getsignal(), pause(), setitimer() or getitimer(); only the main thread can set a new signal handler, and the main thread will be the only one to receive signals (this is enforced by the Python signal module, even if the underlying thread implementation supports sending signals to individual threads). This means that signals can’t be used as a means of inter-thread communication. Use locks instead."

    Sending signals to a process with multiple threads is risky, you should use locks.

    Finally, I think that the documentation should be rephrased: "and the main thread will be the only one to receive signals (this is enforced by the Python signal module, even if the underlying thread implementation supports sending signals to individual threads)." It's false. What's guaranteed is that the signal handler will only be executed on behalf of the main thread, but any thread can _receive_ a signal. And comments in Modules/signalmodule.c are misleading: We still have the problem that in some implementations signals generated by the keyboard (e.g. SIGINT) are delivered to all threads (e.g. SGI), while in others (e.g. Solaris) such signals are delivered to one random thread (an intermediate possibility would be to deliver it to the main thread -- POSIX?). For now, we have a working implementation that works in all three cases -- the handler ignores signals if getpid() isn't the same as in the main thread. XXX This is a hack.

    Sounds strange. If only a thread other than the main thread receives the signal and you ignore it, then it's lost, isn't it ? Furthermore, under Linux 2.6 and NPTL, getpid() returns the main thread PID even from another thread.

    Peers ?

    pitrou commented 14 years ago

    Thanks for the detailed analysis, Charles-François.

    Finally, I think that the documentation should be rephrased:

    Yes, I think so.

    Furthermore, under Linux 2.6 and NPTL, getpid() returns the main thread PID even from another thread.

    Yes, those threads belong to the same process.

    But as mentioned, signals are a rather fragile inter-process communication device; just use a specific file descriptor. And if you still wanna use signals, there's set_wakeup_fd(): http://docs.python.org/library/signal.html#signal.set_wakeup_fd

    aa295720-5a49-4ff0-b9cc-4061f453b2dc commented 9 years ago

    Agree with Charles-François's second explanation. This makes it very hard to reliably handle signals -- basically everyone has to remember to use set_wakeup_fd, and most people don't. For example, gunicorn is likely vulnerable to this because it doesn't use set_wakeup_fd. I suspect most code using select + signals is wrong.

    I've attached a patch which fixes the issue for select(), but not any other functions. If it's considered a good patch, I can work on the rest of the functions in the select module. (Also, tests for the details of the behavior.)

    Also the patch is pretty hokey, so I'd appreciate feedback if it's going to go in. :)

    aa295720-5a49-4ff0-b9cc-4061f453b2dc commented 9 years ago

    Adding haypo since apparently he's been touching signals stuff a lot lately, maybe has some useful thoughts / review? :)

    terryjreedy commented 9 years ago

    This was turned into a doc issue, with no patch forthcoming, but Devin has submitted a bugfix. Should this be turned back into a bug issue?

    e7b5866c-78b3-4bbe-83be-3ab2f38a160c commented 6 years ago

    A workaround to handle signals reliably that I successfully tested now is to execute everything within a subthread and let the main thread just join this subthread. Like:

    signal.signal(MY_SIGNAL, signal_handler)
    threading.Thread(target = my_main_function)
    thread.start()
    thread.join()

    Doing it like this, the main thread should always listen to signals disregarding whether the subthread is stuck.