nalepae / pandarallel

A simple and efficient tool to parallelize Pandas operations on all available CPUs
https://nalepae.github.io/pandarallel
BSD 3-Clause "New" or "Revised" License
3.59k stars 208 forks source link

Does pandarallel not support parallel_apply with multiple columns groupby? #253

Open akaymd opened 8 months ago

akaymd commented 8 months ago

In my environment, df.groupby(args).apply(func) with single column can be replaced by parallel_apply as follows

df.groupby("col1").apply(func)

However, groupby with multiple columns did not work in parallel.

df.groupby(["col1", "col2", ...]) .apply(func)

Does pandarallel not support parallel_apply with multiple columns groupby?

perveen-shaheen commented 7 months ago

I have the exact use-case, weirdly enough it was working till last week. Is this supported?

AymanElsayeed commented 6 months ago

@akaymd what are the versions of Python, Pandas, Numpy, and Pandarallel?

nalepae commented 5 months ago

Pandaral·lel is looking for a maintainer! If you are interested, please open an GitHub issue.

shermansiu commented 2 months ago

Is there a minimally working code example to share? As well as the versions of Python and the relevant packages?

shermansiu commented 2 months ago

Using groupby on multiple columns works fine for me.

import pandas as pd
import pandarallel

pandarallel.pandarallel.initialize()

df = pd.DataFrame({"foo": range(20), "bar": range(20, 40)})
df["even"] = df["foo"] % 2 == 0
df["four"] = df["foo"] % 4 == 0
assert df.groupby(["even", "four"]).apply(lambda x: x+1).equals(df.groupby(["even", "four"]).parallel_apply(lambda x: x+1))

Python: 3.10.13 Pandarallel: 1.6.5 Pandas: 2.2.0