Version: 045d5ed15305ccc44a0330e6f65f669998815598, redis pulled in via tox.ini/docker
Platform: Gentoo Linux (also affects macOS)
Description:
When using the spawn start method in Python multiprocessing, all tests from tests/test_multiprocessing.py fail with pickling errors. Initially, this is because of target= being a local scope function (apparently only global functions are supported). However, after resolving the immediate problem it turns out that Redis objects have multiple serialization issues as well.
The default starting method on most of Unix derivatives is fork, and Redis happens to work fine there because most of the constructed objects persist through the fork and don't have to be explicitly passed to children. However, the fork method tends to cause issues when the parent process is using threads as well. Since CPython 3.8, macOS target defaults to spawn method instead, and we're experimenting with switching PyPy3.9 to it as well (since we're dealing with very hard thread/fork issues as well).
The spawn method requires Python to pass objects to children and this requires serializing (pickling) them. Unfortunately, it seems that the Redis class internally uses a number of types that aren't serializable.
Reproducer & backtraces:
The simplest way to reproduce the problem with any Python version is to explicitly force spawn method from the code. This can be done e.g. using the following patch to conftest:
diff --git a/tests/conftest.py b/tests/conftest.py
index d9de876..99f82a4 100644
--- a/tests/conftest.py
+++ b/tests/conftest.py
@@ -1,3 +1,4 @@
+import multiprocessing
import random
import time
from unittest.mock import Mock
@@ -12,6 +13,8 @@ from redis.connection import parse_url
from redis.exceptions import RedisClusterException
from redis.retry import Retry
+multiprocessing.set_start_method("spawn")
+
REDIS_INFO = {}
default_redis_url = "redis://localhost:6379/9"
default_redismod_url = "redis://localhost:36379"
Afterwards, running e.g. tests/test_multiprocessing.py::TestMultiprocessing::test_redis_client yields:
____________________ TestMultiprocessing.test_redis_client _____________________
self = <tests.test_multiprocessing.TestMultiprocessing object at 0x7f34745114c0>
r = Redis<ConnectionPool<Connection<host=localhost,port=6379,db=9>>>
def test_redis_client(self, r):
"A redis client created in a parent can also be used in a child"
assert r.ping() is True
def target(client):
assert client.ping() is True
del client
proc = multiprocessing.Process(target=target, args=(r,))
> proc.start()
tests/test_multiprocessing.py:164:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
/usr/lib/python3.9/multiprocessing/process.py:121: in start
self._popen = self._Popen(self)
/usr/lib/python3.9/multiprocessing/context.py:224: in _Popen
return _default_context.get_context().Process._Popen(process_obj)
/usr/lib/python3.9/multiprocessing/context.py:284: in _Popen
return Popen(process_obj)
/usr/lib/python3.9/multiprocessing/popen_spawn_posix.py:32: in __init__
super().__init__(process_obj)
/usr/lib/python3.9/multiprocessing/popen_fork.py:19: in __init__
self._launch(process_obj)
/usr/lib/python3.9/multiprocessing/popen_spawn_posix.py:47: in _launch
reduction.dump(process_obj, fp)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
obj = <Process name='Process-1' parent=241897 initial>
file = <_io.BytesIO object at 0x7f34745130e0>, protocol = None
def dump(obj, file, protocol=None):
'''Replacement for pickle.dump() using ForkingPickler.'''
> ForkingPickler(file, protocol).dump(obj)
E AttributeError: Can't pickle local object 'TestMultiprocessing.test_redis_client.<locals>.target'
/usr/lib/python3.9/multiprocessing/reduction.py:60: AttributeError
After moving def target to global scope, further issues are found:
____________________ TestMultiprocessing.test_redis_client _____________________
self = <tests.test_multiprocessing.TestMultiprocessing object at 0x7f2c70840a90>
r = Redis<ConnectionPool<Connection<host=localhost,port=6379,db=9>>>
def test_redis_client(self, r):
"A redis client created in a parent can also be used in a child"
assert r.ping() is True
proc = multiprocessing.Process(target=target, args=(r,))
> proc.start()
tests/test_multiprocessing.py:160:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
/usr/lib/python3.9/multiprocessing/process.py:121: in start
self._popen = self._Popen(self)
/usr/lib/python3.9/multiprocessing/context.py:224: in _Popen
return _default_context.get_context().Process._Popen(process_obj)
/usr/lib/python3.9/multiprocessing/context.py:284: in _Popen
return Popen(process_obj)
/usr/lib/python3.9/multiprocessing/popen_spawn_posix.py:32: in __init__
super().__init__(process_obj)
/usr/lib/python3.9/multiprocessing/popen_fork.py:19: in __init__
self._launch(process_obj)
/usr/lib/python3.9/multiprocessing/popen_spawn_posix.py:47: in _launch
reduction.dump(process_obj, fp)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
obj = <Process name='Process-1' parent=256884 initial>
file = <_io.BytesIO object at 0x7f2c7083f360>, protocol = None
def dump(obj, file, protocol=None):
'''Replacement for pickle.dump() using ForkingPickler.'''
> ForkingPickler(file, protocol).dump(obj)
E TypeError: cannot pickle '_thread.lock' object
/usr/lib/python3.9/multiprocessing/reduction.py:60: TypeError
This is because of threading.Lock objects being used. To be honest, I'm not sure if this isn't a bug by itself because they aren't going to work correctly if used by parallel processes. I can get around this issue by replacing threading.Lock uses with multiprocessing.Lock that's multiprocessing-safe. However, even more issues occur:
____________________ TestMultiprocessing.test_redis_client _____________________
self = <tests.test_multiprocessing.TestMultiprocessing object at 0x7f0a7a87efd0>
r = Redis<ConnectionPool<Connection<host=localhost,port=6379,db=9>>>
def test_redis_client(self, r):
"A redis client created in a parent can also be used in a child"
assert r.ping() is True
proc = multiprocessing.Process(target=target, args=(r,))
> proc.start()
tests/test_multiprocessing.py:165:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
/usr/lib/python3.9/multiprocessing/process.py:121: in start
self._popen = self._Popen(self)
/usr/lib/python3.9/multiprocessing/context.py:224: in _Popen
return _default_context.get_context().Process._Popen(process_obj)
/usr/lib/python3.9/multiprocessing/context.py:284: in _Popen
return Popen(process_obj)
/usr/lib/python3.9/multiprocessing/popen_spawn_posix.py:32: in __init__
super().__init__(process_obj)
/usr/lib/python3.9/multiprocessing/popen_fork.py:19: in __init__
self._launch(process_obj)
/usr/lib/python3.9/multiprocessing/popen_spawn_posix.py:47: in _launch
reduction.dump(process_obj, fp)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
obj = <Process name='Process-1' parent=258826 initial>
file = <_io.BytesIO object at 0x7f0a7741b310>, protocol = None
def dump(obj, file, protocol=None):
'''Replacement for pickle.dump() using ForkingPickler.'''
> ForkingPickler(file, protocol).dump(obj)
E _pickle.PicklingError: Can't pickle <function Redis.<lambda> at 0x7f0a77724a60>: attribute lookup Redis.<lambda> on redis.client failed
/usr/lib/python3.9/multiprocessing/reduction.py:60: PicklingError
At this point, I have given up trying to fix it further.
Possible solutions:
The proper solution to the problem would be to make Redis objects 100% serializable, if it is intended for them to be passed to subprocesses. However, it's hard for me to tell how much work it entails and if it's worth it.
The "cheap" solution would be to do the opposite of what the reproducer patch does, i.e.:
multiprocessing.set_start_method("fork")
on top of conftest.py and expect that any project that uses redis in multiprocessing will do something similar.
Version: 045d5ed15305ccc44a0330e6f65f669998815598, redis pulled in via tox.ini/docker Platform: Gentoo Linux (also affects macOS)
Description:
When using the spawn start method in Python multiprocessing, all tests from
tests/test_multiprocessing.py
fail with pickling errors. Initially, this is because oftarget=
being a local scope function (apparently only global functions are supported). However, after resolving the immediate problem it turns out thatRedis
objects have multiple serialization issues as well.The default starting method on most of Unix derivatives is
fork
, and Redis happens to work fine there because most of the constructed objects persist through the fork and don't have to be explicitly passed to children. However, the fork method tends to cause issues when the parent process is using threads as well. Since CPython 3.8, macOS target defaults to spawn method instead, and we're experimenting with switching PyPy3.9 to it as well (since we're dealing with very hard thread/fork issues as well).The spawn method requires Python to pass objects to children and this requires serializing (pickling) them. Unfortunately, it seems that the
Redis
class internally uses a number of types that aren't serializable.Reproducer & backtraces:
The simplest way to reproduce the problem with any Python version is to explicitly force
spawn
method from the code. This can be done e.g. using the following patch to conftest:Afterwards, running e.g.
tests/test_multiprocessing.py::TestMultiprocessing::test_redis_client
yields:After moving
def target
to global scope, further issues are found:This is because of
threading.Lock
objects being used. To be honest, I'm not sure if this isn't a bug by itself because they aren't going to work correctly if used by parallel processes. I can get around this issue by replacingthreading.Lock
uses withmultiprocessing.Lock
that's multiprocessing-safe. However, even more issues occur:At this point, I have given up trying to fix it further.
Possible solutions:
The proper solution to the problem would be to make
Redis
objects 100% serializable, if it is intended for them to be passed to subprocesses. However, it's hard for me to tell how much work it entails and if it's worth it.The "cheap" solution would be to do the opposite of what the reproducer patch does, i.e.:
on top of
conftest.py
and expect that any project that uses redis in multiprocessing will do something similar.