Open Mengda-Li opened 3 years ago
Could you post a minimal, fully copy-pastable, reproducible example? https://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports
import pandas as pd
from pandas import Timestamp
df = pd.DataFrame({'price': {Timestamp('2021-09-01 00:00:00.023000'): 47150.32,
Timestamp('2021-09-01 00:00:00.093000'): 47150.33,
Timestamp('2021-09-01 00:00:00.994000'): 47153.48,
Timestamp('2021-09-01 00:00:02.050000'): 47153.47,
Timestamp('2021-09-01 00:00:02.889000'): 47153.47},
'qty': {Timestamp('2021-09-01 00:00:00.023000'): 0.002,
Timestamp('2021-09-01 00:00:00.093000'): 0.002,
Timestamp('2021-09-01 00:00:00.994000'): 0.006,
Timestamp('2021-09-01 00:00:02.050000'): 0.006,
Timestamp('2021-09-01 00:00:02.889000'): 0.05},
'quoteQty': {Timestamp('2021-09-01 00:00:00.023000'): 94.3,
Timestamp('2021-09-01 00:00:00.093000'): 94.3,
Timestamp('2021-09-01 00:00:00.994000'): 282.92,
Timestamp('2021-09-01 00:00:02.050000'): 282.92,
Timestamp('2021-09-01 00:00:02.889000'): 2357.67}})
r = df.head(10).resample('1s')
def vwap(x):
print("it's vwap")
print(x)
p = x.price
print("it's p")
print(p)
q = x.qty
print("it's q")
print(q)
# print(x.price)
return (p @ q)/q.sum()
def sum_qty(x):
print(x)
return x.qty.sum()
def sum_quoteQty(x):
return x.quoteQty.sum()
r.apply({"price" : vwap, "qty": sum_qty, "quoteQty": sum_quoteQty})
@Mengda-Li
In my understanding, if the func in Resampler.apply(func, *args, **kwargs) is a function, pandas will either try to pass each series of each grouped dataframe into the function, or each grouped dataframe into the function. However; if the func is a list or a dict, pandas will only try to pass each column of each grouped dataframe into functions. That is to say, if the func is a list or a dict, the functions in list or dict can only accept series as input. In your example, r.apply(vwap) can be correctly executed because the x passed into vwap is a dataframe. While r.apply({"price": vwap}) will raise a error because the x passed into vwap is only the column 'price'.
Maybe the wap_func defined as follow suits your needs:
def wap_func(x):
price_dot_qty = (x.price @ x.qty) / x.qty.sum()
qty_sum = x.qty.sum()
quoteQty_sum = x.quoteQty.sum()
return pd.Series({'price_dot_qty': price_dot_qty, 'qty_sum': qty_sum, 'quoteQty_sum': quoteQty_sum})
r.apply(wap_func)
Could it be an pandas enhancement to accept func = [vwap, sum_qty, sum_quoteQty] in Resampler.apply(func, *args, **kwargs) ?
Could it be an pandas enhancement to accept func = [vwap, sum_qty, sum_quoteQty] in Resampler.apply(func, *args, **kwargs) ?
I think so. It will prevent recalling Resampler.apply
for multiple times if we don't know the wap_func
can return a pd.Series
. (For me, it will save me a lot of computing time.)
[X] I have checked that this issue has not already been reported.
[X] I have confirmed this bug exists on the latest version of pandas.
[ ] I have confirmed this bug exists on the master branch of pandas.
Reproducible Example
Issue Description
Get an AttributeError because the attribute
price
cannot be locatedAttributeError: 'Series' object has no attribute 'price'
when execute
get a debug print from function
vwap
Then the error message is
Expected Behavior
Resampler.apply
can locate attribute with one functionwhich returns
and the debug prints in
vwap
show it can locate attributesprice
andvwap
:Installed Versions