Open omrihar opened 1 day ago
The error you're encountering is due to the RedisKVStore
instance not being serializable, which is required for multiprocessing. The RedisKVStore
class creates a Redis client connection that includes non-serializable objects like _thread.lock
, which cannot be pickled and thus cannot be shared across processes in a multiprocessing pool [1].
To work around this issue, you can consider the following approaches:
Disable Caching During Multiprocessing: If caching is not critical for your multiprocessing tasks, you can disable the cache when running the pipeline with multiple workers. This can be done by setting the cache
parameter to None
or using a conditional to disable it when num_workers > 1
.
Use a Different Caching Strategy: If caching is necessary, consider using a different caching mechanism that supports serialization, such as a file-based cache or an in-memory cache that can be shared across processes.
Custom Serialization: Implement custom serialization logic for the RedisKVStore
if possible, though this can be complex and may not be feasible depending on the underlying library's constraints.
These approaches can help you bypass the serialization issue with RedisKVStore
when using multiprocessing in your ingestion pipeline.
To continue talking to Dosu, mention @dosu.
Bug Description
Having an ingestion pipeline with a redis cache makes the pipeline not compatible with multiprocessing, so setting
num_workers
to be nonzero causes the pipeline to fail withTypeError: cannot pickle '_thread.lock' object
.Version
0.12.1
Steps to Reproduce
Have redis running somewhere (I use localhost) and execute the following code:
Relevant Logs/Tracbacks