Closed bc75918c-a209-4fa3-b6cf-28cfb7317f76 closed 5 years ago
Originally reported in https://bugs.python.org/issue31463
This started to bother us in Fedora rawhide for various Python versions, so chances are something changed on the system level.
# python3.7 -m test.regrtest test_multiprocessing_fork
Run tests sequentially
0:00:00 load avg: 1.24 [1/1] test_multiprocessing_fork
/usr/lib64/python3.7/multiprocessing/semaphore_tracker.py:55: UserWarning: semaphore_tracker: process died unexpectedly, relaunching. Some semaphores might leak.
warnings.warn('semaphore_tracker: process died unexpectedly, '
Exception in thread Thread-26:
Traceback (most recent call last):
File "/usr/lib64/python3.7/threading.py", line 917, in _bootstrap_inner
self.run()
File "/usr/lib64/python3.7/threading.py", line 865, in run
self._target(*self._args, **self._kwargs)
File "/usr/lib64/python3.7/multiprocessing/resource_sharer.py", line 139, in _serve
signal.pthread_sigmask(signal.SIG_BLOCK, range(1, signal.NSIG))
File "/usr/lib64/python3.7/signal.py", line 60, in pthread_sigmask
sigs_set = _signal.pthread_sigmask(how, mask)
ValueError: signal number 32 out of range
(hangs...)
^CProcess Process-184:
Traceback (most recent call last):
File "/usr/lib64/python3.7/multiprocessing/process.py", line 297, in _bootstrap
self.run()
File "/usr/lib64/python3.7/multiprocessing/process.py", line 99, in run
self._target(*self._args, **self._kwargs)
File "/usr/lib64/python3.7/test/_test_multiprocessing.py", line 3328, in child_access
w = conn.recv()
File "/usr/lib64/python3.7/multiprocessing/connection.py", line 251, in recv
return _ForkingPickler.loads(buf.getbuffer())
File "/usr/lib64/python3.7/multiprocessing/connection.py", line 960, in rebuild_connection
fd = df.detach()
File "/usr/lib64/python3.7/multiprocessing/resource_sharer.py", line 57, in detach
with _resource_sharer.get_connection(self._id) as conn:
File "/usr/lib64/python3.7/multiprocessing/resource_sharer.py", line 87, in get_connection
c = Client(address, authkey=process.current_process().authkey)
File "/usr/lib64/python3.7/multiprocessing/connection.py", line 498, in Client
answer_challenge(c, authkey)
File "/usr/lib64/python3.7/multiprocessing/connection.py", line 741, in answer_challenge
message = connection.recv_bytes(256) # reject large message
File "/usr/lib64/python3.7/multiprocessing/connection.py", line 216, in recv_bytes
buf = self._recv_bytes(maxlength)
File "/usr/lib64/python3.7/multiprocessing/connection.py", line 407, in _recv_bytes
buf = self._recv(4)
File "/usr/lib64/python3.7/multiprocessing/connection.py", line 379, in _recv
chunk = read(handle, remaining)
KeyboardInterrupt
Warning -- multiprocessing.process._dangling was modified by test_multiprocessing_fork
Before: <_weakrefset.WeakSet object at 0x7f2abd5f6d30>
After: <_weakrefset.WeakSet object at 0x7f2abd5f6128>
Warning -- threading._dangling was modified by test_multiprocessing_fork
Before: <_weakrefset.WeakSet object at 0x7f2abd5f66d8>
After: <_weakrefset.WeakSet object at 0x7f2abc835048>
Test suite interrupted by signal SIGINT. 1 test omitted: test_multiprocessing_fork
Total duration: 3 min 43 sec Tests result: INTERRUPTED
--------------------------------
# python3.7 -m test.regrtest test_multiprocessing_forkserver
Run tests sequentially
0:00:00 load avg: 1.00 [1/1] test_multiprocessing_forkserver
Exception in thread Thread-26:
Traceback (most recent call last):
File "/usr/lib64/python3.7/threading.py", line 917, in _bootstrap_inner
self.run()
File "/usr/lib64/python3.7/threading.py", line 865, in run
self._target(*self._args, **self._kwargs)
File "/usr/lib64/python3.7/multiprocessing/resource_sharer.py", line 139, in _serve
signal.pthread_sigmask(signal.SIG_BLOCK, range(1, signal.NSIG))
File "/usr/lib64/python3.7/signal.py", line 60, in pthread_sigmask
sigs_set = _signal.pthread_sigmask(how, mask)
ValueError: signal number 32 out of range
(hangs...)
^CProcess Process-184:
Traceback (most recent call last):
...
KeyboardInterrupt
Warning -- multiprocessing.process._dangling was modified by test_multiprocessing_forkserver
Before: <_weakrefset.WeakSet object at 0x7f0323ecde10>
After: <_weakrefset.WeakSet object at 0x7f0323ecd208>
Warning -- threading._dangling was modified by test_multiprocessing_forkserver
Before: <_weakrefset.WeakSet object at 0x7f0323ecd7b8>
After: <_weakrefset.WeakSet object at 0x7f0323ecddd8>
Test suite interrupted by signal SIGINT. 1 test omitted: test_multiprocessing_forkserver
Total duration: 55 sec Tests result: INTERRUPTED
--------------------------------
# python3.7 -m test.regrtest test_multiprocessing_spawn
Run tests sequentially
0:00:00 load avg: 1.49 [1/1] test_multiprocessing_spawn
Exception in thread Thread-26:
Traceback (most recent call last):
File "/usr/lib64/python3.7/threading.py", line 917, in _bootstrap_inner
self.run()
File "/usr/lib64/python3.7/threading.py", line 865, in run
self._target(*self._args, **self._kwargs)
File "/usr/lib64/python3.7/multiprocessing/resource_sharer.py", line 139, in _serve
signal.pthread_sigmask(signal.SIG_BLOCK, range(1, signal.NSIG))
File "/usr/lib64/python3.7/signal.py", line 60, in pthread_sigmask
sigs_set = _signal.pthread_sigmask(how, mask)
ValueError: signal number 32 out of range
(hangs...)
^CProcess Process-184:
Traceback (most recent call last):
...
KeyboardInterrupt
Warning -- multiprocessing.process._dangling was modified by test_multiprocessing_spawn
Before: <_weakrefset.WeakSet object at 0x7fb2134a9dd8>
After: <_weakrefset.WeakSet object at 0x7fb2134a91d0>
Warning -- threading._dangling was modified by test_multiprocessing_spawn
Before: <_weakrefset.WeakSet object at 0x7fb2134a9780>
After: <_weakrefset.WeakSet object at 0x7fb2134a9da0>
Test suite interrupted by signal SIGINT. 1 test omitted: test_multiprocessing_spawn
Total duration: 54 sec Tests result: INTERRUPTED
--------------------------------
Happens with all 3 tests on 3.4 to 3.7.
To test in docker, one can do:
$ docker run -ti fedora:rawhide /bin/bash
# dnf update
# dnf install python37 # or python3-test for 3.6, or python35, python34
Note that without dnf update
, the tests work for me (for now) so attaching a full package diff that "starts" this.
Perhaps you can compile and run this C program, before and after the system changes, and post the output:
#include <errno.h>
#include <signal.h>
#include <stdio.h>
int main(int argc, char** argv)
{
sigset_t set;
int i, ret;
printf("NSIG = %d\n", NSIG);
for (i = 1; i < NSIG; i++) {
errno = 0;
ret = sigaddset(&set, i);
printf("sigaddset(%d) returned %d, errno =%d\n", i, ret, errno);
}
return 0;
}
(on Ubuntu 16.04, I get NSIG = 65, and all signal numbers work fine)
I'll try. In the meantime, I've checked and it's glibc update that makes the difference:
glibc-2.27.9000-7.fc29 -> glibc-2.27.9000-14.fc29
I'll see what was changed there and whether is was intentional, or bug in Fedora.
I was able to compile the program above after the system upgrade using your Docker image, and the following stands out:
sigaddset(31) returned 0, errno = 0 sigaddset(32) returned -1, errno = 22 sigaddset(33) returned -1, errno = 22 sigaddset(34) returned 0, errno = 0
However, I do not know how to compile it *before* the system upgrade ("dnf install gcc" seems to install gcc).
glibc-2.27.9000-14.fc29:
NSIG = 65
sigaddset(32) returned -1, errno =22
sigaddset(33) returned -1, errno =22
"dnf install gcc" upgrades glibc, yes. OK, I'll dig into it a bit more, see why and where this change happened. Thanks for hints.
dnf install https://kojipkgs.fedoraproject.org/packages/glibc/2.27.9000/7.fc29/x86_64/glibc-headers-2.27.9000-7.fc29.x86_64.rpm https://kojipkgs.fedoraproject.org/packages/glibc/2.27.9000/7.fc29/x86_64/glibc-devel-2.27.9000-7.fc29.x86_64.rpm dnf install gcc
NSIG = 65
sigaddset(1) returned 0, errno =0
... all ok
glibc-2.27.9000-13.fc29 ... all ok glibc-2.27.9000-14.fc29 ... 32, 33 fail glibc-2.27.9000-15.fc29 (latest built) ... 32, 33 fail
13 to 14 is this in upstream commits:
d39c0a459ef32a41daac4840859bf304d931adab to 583a27d525ae189bdfaa6784021b92a9a1dae12e
Ok, this is due to: https://sourceware.org/bugzilla/show_bug.cgi?id=22391
Short explanation here: https://unix.stackexchange.com/a/155846
Miro, could you check whether 2.7 is affected too?
I cannot find a corresponding test in 2.7. Running the entire build (incl. tests) to see what happens.
Is there a command or a couple I could try instead? I'm afraid I don't understand how does that test work (I seems a bit complicated)
Running the whole test suite sounds fine.
Python 2 testsuite runs fine.
Setting as release blocker as it impacts functionality of the multiprocessing module.
Should the fix be to exclude 32 and 33 from multiprocessing.resource_sharer:_serve?
if hasattr(signal, 'pthread_sigmask'):
signal.pthread_sigmask(signal.SIG_BLOCK, range(1, signal.NSIG))
We could do that, but I'd rather something not multiprocessing-specific. I think we have to ignore the return value of sigaddset() until bpo-33332 allows us to be smarter.
We could do that, but I'd rather something not multiprocessing-specific
... and also not Linux-specific.
Miro, can you give the following patch a try?
diff --git a/Modules/signalmodule.c b/Modules/signalmodule.c
index 7916160..ca76a20 100644
--- a/Modules/signalmodule.c
+++ b/Modules/signalmodule.c
@@ -819,7 +819,6 @@ iterable_to_sigset(PyObject *iterable, sigset_t *mask)
int result = -1;
PyObject *iterator, *item;
long signum;
- int err;
sigemptyset(mask);
@@ -841,11 +840,10 @@ iterable_to_sigset(PyObject *iterable, sigset_t *mask)
Py_DECREF(item);
if (signum == -1 && PyErr_Occurred())
goto error;
- if (0 < signum && signum < NSIG)
- err = sigaddset(mask, (int)signum);
- else
- err = 1;
- if (err) {
+ if (0 < signum && signum < NSIG) {
+ (void) sigaddset(mask, (int)signum);
+ }
+ else {
PyErr_Format(PyExc_ValueError,
"signal number %ld out of range", signum);
goto error;
This indeed makes the tests pass again:
# python3.7 -m test.regrtest test_multiprocessing_fork test_multiprocessing_forkserver test_multiprocessing_spawn Run tests sequentially 0:00:00 load avg: 1.02 [1/3] test_multiprocessing_fork /usr/lib64/python3.7/multiprocessing/semaphore_tracker.py:55: UserWarning: semaphore_tracker: process died unexpectedly, relaunching. Some semaphores might leak. warnings.warn('semaphore_tracker: process died unexpectedly, ' 0:01:05 load avg: 1.00 [2/3] test_multiprocessing_forkserver -- test_multiprocessing_fork passed in 1 min 6 sec 0:02:16 load avg: 1.69 [3/3] test_multiprocessing_spawn -- test_multiprocessing_forkserver passed in 1 min 11 sec test_multiprocessing_spawn passed in 1 min 23 sec All 3 tests OK.
Total duration: 3 min 40 sec Tests result: SUCCESS
Why not export and use the canonical way of sigemptyset/sigfillset/sigaddset/sigdelset/sigismember instead of pushing for more potential non-conformant code? For glibc sigfillset will correctly fill all the signal set structure while removing the internal used signals. This is par what POSIX specifies [1] where it states that:
'either sigemptyset() or sigfillset() must be called prior to any other use of the signal set'
And more importantly:
'For example, blocking or ignoring an implementation-defined signal may have undesirable side-effects, whereas the default action for that signal is harmless. In such a case, it would be preferable for such a signal to be excluded from the signal set returned by sigfillset().'
Also keep that since is an implementation detail, different libcs can use different internal signals. UCLIBC, for instance, uses the same 2 signals as GLIBC, however MUSL allocates signal 32, 33, and 34 for internal usage (and excludes in sigfillset and warns with EINVAL on sigaddset).
Also keep in mind that POSIX [1] specifies that sigaddset *may* fail with EINVAL for not support signals, so a conforming implementation may not fail on sigaddset and still remove the internal signal in a sigprocmask (uclibc for instance).
Why not export and use the canonical way of sigemptyset/sigfillset/sigaddset/sigdelset/sigismember instead of pushing for more potential non-conformant code?
I agree this is the proper fix and that's what I plan to do in Python 3.8. For Python 3.7 and earlier, though, we cannot add new features anymore, which is why I'm leaning towards a variant of the patch I showed above (which also minimizes the risk for regressions by introducing and using a new API).
One option would be to create a list of possible defined signals and check if the signal is on the list. For realtime signals, it just a matter to check if SIGRTMIN \<= signal \<= SIGRTMAX.
The glibc defined signals can be checked at tst-signal.c [1] or from main signal(7). It should cover usual ISO C, POSIX, and some linux arch-specific signals, but you will still need to check if other OS defined extra signals uses elsewhere (another option would to add this check only for Linux/glibc).
I don't think we want to be in the business of maintaining a list of cross-platform of supported signal values. Python may be compiled on bizarre non-glibc systems (or even systems with a proprietary libc).
Yes I am aware, but I can't see a really portable way to provide the same functionality as 'sigfillset'. Ideally a libc implementation would return EINVAL on 'sigaddset' for invalid signals, but even for glibc this is not true (the very issue I fixed).
Right, but it seems that even if sigaddset() allowed you to "set" signals 32 and 33, that would be ignored by pthread_sigmask().
This is what I get here (Ubuntu 16.04, glibc 2.23):
>>> signal.pthread_sigmask(signal.SIG_BLOCK, range(1, 65))
set()
>>> signal.pthread_sigmask(signal.SIG_BLOCK, range(1, 65))
{<Signals.SIGHUP: 1>, <Signals.SIGINT: 2>, <Signals.SIGQUIT: 3>, <Signals.SIGILL: 4>, <Signals.SIGTRAP: 5>, <Signals.SIGABRT: 6>, <Signals.SIGBUS: 7>, <Signals.SIGFPE: 8>, <Signals.SIGUSR1: 10>, <Signals.SIGSEGV: 11>, <Signals.SIGUSR2: 12>, <Signals.SIGPIPE: 13>, <Signals.SIGALRM: 14>, <Signals.SIGTERM: 15>, 16, <Signals.SIGCHLD: 17>, <Signals.SIGCONT: 18>, <Signals.SIGTSTP: 20>, <Signals.SIGTTIN: 21>, <Signals.SIGTTOU: 22>, <Signals.SIGURG: 23>, <Signals.SIGXCPU: 24>, <Signals.SIGXFSZ: 25>, <Signals.SIGVTALRM: 26>, <Signals.SIGPROF: 27>, <Signals.SIGWINCH: 28>, <Signals.SIGIO: 29>, <Signals.SIGPWR: 30>, <Signals.SIGSYS: 31>, <Signals.SIGRTMIN: 34>, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, <Signals.SIGRTMAX: 64>}
Yes, this is the issue I referred in previous comment [1]. Unfortunately it is only fixed on master (which will become 2.28).
Not sure what you mean. In the example above, I try to block signals 32 and 33 using pthread_sigmask(), but the pthread_sigmask() return value shows they weren't blocked, which is ok.
What I mean is for master glibc, which contains BZ#22391, 'sigaddset' will fail do add signal 32 and 33. But since you are now ignoring 'sigaddset' return code, it does not matter.
New changeset 25038ecfb665bef641abf8cb61afff7505b0e008 by Antoine Pitrou in branch 'master': bpo-33329: Fix multiprocessing regression on newer glibcs (GH-6575) https://github.com/python/cpython/commit/25038ecfb665bef641abf8cb61afff7505b0e008
New changeset 75a3e3d5bc0be1ce41289b661e7c53039cf3d5ba by Antoine Pitrou (Miss Islington (bot)) in branch '3.7': bpo-33329: Fix multiprocessing regression on newer glibcs (GH-6575) (GH-6579) https://github.com/python/cpython/commit/75a3e3d5bc0be1ce41289b661e7c53039cf3d5ba
New changeset b0ca398cabd2d2ea2d66fa50b08e297a60388c75 by Antoine Pitrou in branch '3.6': [3.6] bpo-33329: Fix multiprocessing regression on newer glibcs (GH-6575) (GH-6582) https://github.com/python/cpython/commit/b0ca398cabd2d2ea2d66fa50b08e297a60388c75
Miro, this should be fixed now. Please reopen if it isn't.
The sigfillset() functionality will be exposed to Python users in bpo-33332.
Antoine, thank you very much for swift communication and fix.
These issues also occur on Python 3.4 and 3.5. And I'm now upgraded to Ubuntu 18.10 which I guess has the new version of glibc. The regression test suite for both 3.4 and 3.5 blocks forever on three tests (multiprocessing_fork, multiprocessing_forkserver, and multiprocessing_spawn). This affects my ability as RM to run the regression test suite, not to mention affecting the fundamental usability and trustability of 3.4 and 3.5 releases.
Can I please get backports of this patch to 3.4 and 3.5? Note that I consider this a release blocker for 3.4 and 3.5. And I'd expected to tag 3.4.10rc1 and 3.5.7rc1 in about twenty-four hours. So I sure would appreciate a quick turnaround on this if folks are available--otherwise I'll have to slip the schedule. Sorry for the late notice!
I am not able to do PRs right now but here are the Fedora patches:
https://src.fedoraproject.org/rpms/python34/blob/master/f/00302-fix-multiprocessing-regression-on-newer-glibcs.patch https://src.fedoraproject.org/rpms/python35/blob/master/f/00302-fix-multiprocessing-regression-on-newer-glibcs.patch
They seem quite identical to the applied 3.6 patch.
New changeset 8ec1fd11f2d524859cfefae76458fcfd22decf65 by larryhastings (Cheryl Sabella) in branch '3.5': [3.5] bpo-33329: Fix multiprocessing regression on newer glibcs (GH-6575) (bpo-12144) https://github.com/python/cpython/commit/8ec1fd11f2d524859cfefae76458fcfd22decf65
New changeset 2226139aa2b69047cb54dbcfd79f5c2e36f98653 by larryhastings (Cheryl Sabella) in branch '3.4': [3.4] bpo-33329: Fix multiprocessing regression on newer glibcs (GH-6575) (bpo-12145) https://github.com/python/cpython/commit/2226139aa2b69047cb54dbcfd79f5c2e36f98653
Now fixed in 3.4 and 3.5. I can cut the RCs. Huzzah!
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields: ```python assignee = 'https://github.com/pitrou' closed_at =
created_at =
labels = ['3.7', '3.8', 'type-bug', 'tests', 'release-blocker']
title = 'sigaddset() can fail on some signal numbers'
updated_at =
user = 'https://github.com/hroncok'
```
bugs.python.org fields:
```python
activity =
actor = 'ned.deily'
assignee = 'pitrou'
closed = True
closed_date =
closer = 'larry'
components = ['Tests']
creation =
creator = 'hroncok'
dependencies = []
files = ['47546']
hgrepos = []
issue_num = 33329
keywords = ['patch']
message_count = 39.0
messages = ['315590', '315594', '315595', '315596', '315597', '315598', '315599', '315600', '315601', '315602', '315604', '315605', '315609', '315610', '315611', '315612', '315613', '315614', '315615', '315616', '315617', '315618', '315626', '315654', '315659', '315660', '315661', '315662', '315667', '315672', '315674', '315675', '315676', '315677', '336968', '336969', '337053', '337055', '337056']
nosy_count = 8.0
nosy_names = ['pitrou', 'larry', 'ned.deily', 'njs', 'neologix', 'lukasz.langa', 'hroncok', 'azanella']
pr_nums = ['6575', '6579', '6580', '6582', '12144', '12145']
priority = 'release blocker'
resolution = 'fixed'
stage = 'resolved'
status = 'closed'
superseder = None
type = 'behavior'
url = 'https://bugs.python.org/issue33329'
versions = ['Python 3.4', 'Python 3.5', 'Python 3.6', 'Python 3.7', 'Python 3.8']
```