Open 20b13c20-bc9c-43ca-8fb6-9d24ce334500 opened 3 years ago
In Python 3.8+, pickle protocol 5 ( PEP<574> ) was added, which supports out-of-band buffer collection[1]. The idea being that when pickling an object with a large amount of data attached to it (like an array, dataframe, etc.) one could collect this large amount of data alongside the normal pickled data without causing a copy. This is important in particular when serializing data for communication between two python instances. IOW this is quite valuable when using a multiprocessing.pool.Pool
[2] or a concurrent.futures.ProcessPoolExecutor
[3]. However AFAICT neither of these leverage this functionality[4][5]. To ensure zero-copy processing of large data, it would be helpful for pickle protocol 5 to be used in both of these pools.
[1] https://docs.python.org/3/library/pickle.html#pickle-oob [2] https://docs.python.org/3/library/multiprocessing.html#multiprocessing.pool.Pool [3] https://docs.python.org/3/library/concurrent.futures.html#concurrent.futures.ProcessPoolExecutor [4] https://github.com/python/cpython/blob/16b5bc68964c6126845f4cdd54b24996e71ae0ba/Lib/multiprocessing/queues.py#L372 [5] https://github.com/python/cpython/blob/16b5bc68964c6126845f4cdd54b24996e71ae0ba/Lib/multiprocessing/queues.py#L245
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields: ```python assignee = None closed_at = None created_at =
labels = ['3.8', '3.9', 'expert-IO', 'performance', '3.11', 'library', '3.10']
title = 'Supporting out-of-band buffers (pickle protocol 5) in multiprocessing'
updated_at =
user = 'https://github.com/jakirkham'
```
bugs.python.org fields:
```python
activity =
actor = 'jakirkham'
assignee = 'none'
closed = False
closed_date = None
closer = None
components = ['Library (Lib)', 'IO']
creation =
creator = 'jakirkham'
dependencies = []
files = []
hgrepos = []
issue_num = 45304
keywords = []
message_count = 1.0
messages = ['402736']
nosy_count = 1.0
nosy_names = ['jakirkham']
pr_nums = []
priority = 'normal'
resolution = None
stage = None
status = 'open'
superseder = None
type = 'performance'
url = 'https://bugs.python.org/issue45304'
versions = ['Python 3.8', 'Python 3.9', 'Python 3.10', 'Python 3.11']
```