nalepae / pandarallel

A simple and efficient tool to parallelize Pandas operations on all available CPUs
https://nalepae.github.io/pandarallel
BSD 3-Clause "New" or "Revised" License
3.59k stars 208 forks source link

AttributeError: 'DataFrameGroupBy' object has no attribute 'parallel_apply' #255

Open beyondguo opened 7 months ago

beyondguo commented 7 months ago

General

Acknowledgement

Bug description

sentiment_df.groupby('scode').parallel_apply(lambda x: x['f'].rolling(window=window_size, min_periods=1).apply(func, raw=True))

Observed behavior

AttributeError: 'DataFrameGroupBy' object has no attribute 'parallel_apply' Write here the observed behavior

Expected behavior

Write here the expected behavior

Minimal but working code sample to ease bug fix for pandarallel team

image

nalepae commented 5 months ago

Pandaral·lel is looking for a maintainer! If you are interested, please open an GitHub issue.

shermansiu commented 2 months ago

I typed up the above code example.

import pandas as pd
import time
from pandarallel import pandarallel
import math
import numpy as np

df_size = int(3e7)
df = pd.DataFrame(dict(a=np.random.randint(1, 1000, df_size),
                       b=np.random.rand(df_size)))

def func(df):
    dum = 0
    for item in df.b:
        dum += math.log10(math.sqrt(math.exp(item**2)))
    return dum / len(df.b)

res_parallel = df.groupby("a").parallel_apply(func)

It works just fine for me.

Python: 3.10.13 Pandarallel: 1.6.5 Pandas: 2.1.0