Open akaymd opened 8 months ago
I have the exact use-case, weirdly enough it was working till last week. Is this supported?
@akaymd what are the versions of Python, Pandas, Numpy, and Pandarallel?
Pandaral·lel is looking for a maintainer! If you are interested, please open an GitHub issue.
Is there a minimally working code example to share? As well as the versions of Python and the relevant packages?
Using groupby on multiple columns works fine for me.
import pandas as pd
import pandarallel
pandarallel.pandarallel.initialize()
df = pd.DataFrame({"foo": range(20), "bar": range(20, 40)})
df["even"] = df["foo"] % 2 == 0
df["four"] = df["foo"] % 4 == 0
assert df.groupby(["even", "four"]).apply(lambda x: x+1).equals(df.groupby(["even", "four"]).parallel_apply(lambda x: x+1))
Python: 3.10.13 Pandarallel: 1.6.5 Pandas: 2.2.0
In my environment, df.groupby(args).apply(func) with single column can be replaced by parallel_apply as follows
df.groupby("col1").apply(func)
However, groupby with multiple columns did not work in parallel.
df.groupby(["col1", "col2", ...]) .apply(func)
Does pandarallel not support parallel_apply with multiple columns groupby?