nalepae / pandarallel

A simple and efficient tool to parallelize Pandas operations on all available CPUs
https://nalepae.github.io/pandarallel
BSD 3-Clause "New" or "Revised" License
3.59k stars 208 forks source link

use pool as spawn #260

Open AlexProfi opened 7 months ago

AlexProfi commented 7 months ago

Hello I try to use it as spawn to work correctly with django database connections like in the this post One possibility is to use multiprocessing spawn child process creation method, which will not copy django's DB connection details to the child processes. The child processes need to bootstrap from scratch, but are free to create/close their own django DB connections.

In calling code:

import multiprocessing from myworker import work_one_item # <-- Your worker method

...

Uses connection A

list_of_items = djago_db_call_one()

'spawn' starts new python processes

with multiprocessing.get_context('spawn').Pool() as pool:

work_one_item will create own DB connection

parallel_results = pool.map(work_one_item, list_of_items)

Continues to use connection A

another_db_call(parallel_results)

but i get error /usr/local/lib/python3.6/multiprocessing/reduction.py", line 60, in dump ForkingPickler(file, protocol).dump(obj) AttributeError: Can't pickle local object 'wrap_work_function_for_pipe..closure'

I try CONTEXT = multiprocessing.get_context("spawn") in the core.py

nalepae commented 5 months ago

Pandaral·lel is looking for a maintainer! If you are interested, please open an GitHub issue.

shermansiu commented 2 months ago

@AlexProfi Please clean up the minimal code example.