pola-rs / polars

Dataframes powered by a multithreaded, vectorized query engine, written in Rust
https://docs.pola.rs
Other
29.94k stars 1.93k forks source link

Parallel apply and map #7243

Open indigoviolet opened 1 year ago

indigoviolet commented 1 year ago

Problem description

I wish I could tell Polars that my apply function or my map function is safe to run in parallel, and it would automatically use multiprocessing to run it over my column. This seems like a common case which could be made very easy to use.

ritchie46 commented 1 year ago

Multiprocessing would likely content with polars.

Besides that it must clone data and has a terrible start up/break down cost

We can allow multithreading and use polars thread pool. This will have benefit if your python function releases the GIL.

Hoeze commented 1 year ago

This relates a bit to my issue here: https://github.com/pola-rs/polars/issues/6157#issuecomment-1377420903 Just the inverse way of running Polars in multiprocessing