python / cpython

The Python programming language
https://www.python.org
Other
63.87k stars 30.57k forks source link

Supporting out-of-band buffers (pickle protocol 5) in multiprocessing #89467

Open 20b13c20-bc9c-43ca-8fb6-9d24ce334500 opened 3 years ago

20b13c20-bc9c-43ca-8fb6-9d24ce334500 commented 3 years ago
BPO 45304
Nosy @jakirkham

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields: ```python assignee = None closed_at = None created_at = labels = ['3.8', '3.9', 'expert-IO', 'performance', '3.11', 'library', '3.10'] title = 'Supporting out-of-band buffers (pickle protocol 5) in multiprocessing' updated_at = user = 'https://github.com/jakirkham' ``` bugs.python.org fields: ```python activity = actor = 'jakirkham' assignee = 'none' closed = False closed_date = None closer = None components = ['Library (Lib)', 'IO'] creation = creator = 'jakirkham' dependencies = [] files = [] hgrepos = [] issue_num = 45304 keywords = [] message_count = 1.0 messages = ['402736'] nosy_count = 1.0 nosy_names = ['jakirkham'] pr_nums = [] priority = 'normal' resolution = None stage = None status = 'open' superseder = None type = 'performance' url = 'https://bugs.python.org/issue45304' versions = ['Python 3.8', 'Python 3.9', 'Python 3.10', 'Python 3.11'] ```

20b13c20-bc9c-43ca-8fb6-9d24ce334500 commented 3 years ago

In Python 3.8+, pickle protocol 5 ( PEP<574> ) was added, which supports out-of-band buffer collection[1]. The idea being that when pickling an object with a large amount of data attached to it (like an array, dataframe, etc.) one could collect this large amount of data alongside the normal pickled data without causing a copy. This is important in particular when serializing data for communication between two python instances. IOW this is quite valuable when using a multiprocessing.pool.Pool[2] or a concurrent.futures.ProcessPoolExecutor[3]. However AFAICT neither of these leverage this functionality[4][5]. To ensure zero-copy processing of large data, it would be helpful for pickle protocol 5 to be used in both of these pools.

[1] https://docs.python.org/3/library/pickle.html#pickle-oob [2] https://docs.python.org/3/library/multiprocessing.html#multiprocessing.pool.Pool [3] https://docs.python.org/3/library/concurrent.futures.html#concurrent.futures.ProcessPoolExecutor [4] https://github.com/python/cpython/blob/16b5bc68964c6126845f4cdd54b24996e71ae0ba/Lib/multiprocessing/queues.py#L372 [5] https://github.com/python/cpython/blob/16b5bc68964c6126845f4cdd54b24996e71ae0ba/Lib/multiprocessing/queues.py#L245

iritkatriel commented 2 years ago

See also https://github.com/python/cpython/issues/84895.