xorbitsai / xorbits

Scalable Python DS & ML, in an API compatible & lightning fast way.
https://xorbits.readthedocs.io
Apache License 2.0
1.11k stars 67 forks source link

BUG: user-defined function groupby.agg has unexpected keyword argument #733

Closed luweizheng closed 12 months ago

luweizheng commented 12 months ago

Describe the bug

A user-defined function:

def realized_volatility(series):
    return np.sqrt(np.sum(series**2))

features = {
        "log_return1": [np.sum, realized_volatility, np.mean, np.std]
    }

gb_cols = ["stock_id", "time_id"]

agg = book.groupby(gb_cols).agg(features).reset_index(drop=False)

The same code can run successfully on pandas and modin.

  File "/home/u20200002/hpc/xorbits/python/xorbits/_mars/dataframe/reduction/aggregation.py", line 1008, in is_funcs_aggregate
    compiler.add_function(f, 2, cols=["A", "B"])
  File "/home/u20200002/hpc/xorbits/python/xorbits/_mars/dataframe/reduction/core.py", line 851, in add_function
    compile_result = self._compile_function(func, func_name, ndim=ndim)
  File "/home/u20200002/hpc/xorbits/python/xorbits/_mars/core/mode.py", line 78, in _inner
    return func(*args, **kwargs)
  File "/home/u20200002/hpc/xorbits/python/xorbits/_mars/dataframe/reduction/core.py", line 927, in _compile_function
    func_ret = self._build_mock_return_object(func, func_idx, object, ndim=1)
  File "/home/u20200002/hpc/xorbits/python/xorbits/_mars/dataframe/reduction/core.py", line 904, in _build_mock_return_object
    return func(mock_obj)
  File "/home/u20200002/hpc/test_xorbits/benchmarks/optiver_volatility/xorbits_pipe/preprocess.py", line 21, in realized_volatility
    return np.sqrt(np.sum(series**2))
  File "<__array_function__ internals>", line 200, in sum
  File "/fs/fast/u20200002/envs/ucx/lib/python3.9/site-packages/numpy/core/fromnumeric.py", line 2324, in sum
    return _wrapreduction(a, np.add, 'sum', axis, dtype, out, keepdims=keepdims,
  File "/fs/fast/u20200002/envs/ucx/lib/python3.9/site-packages/numpy/core/fromnumeric.py", line 84, in _wrapreduction
    return reduction(axis=axis, out=out, **passkwargs)
TypeError: sum_series() got an unexpected keyword argument 'out'

A sample book parquet file:

c439ef22282f412ba39e9137a3fdabac.parquet.zip

codingl2k1 commented 12 months ago

image @aresnow1 It seems that xorbits dataframe sum method does not fully compatible with numpy.

codingl2k1 commented 12 months ago

take