Description
First thank you guys for this wonderful library. It does many pd operations pretty well given mem constraints (except maybe cumsum() which i am eagerly waiting.)
I have a arrow file ~8GB which i load in vaex df of shape: (27_416_244, 32). System avlbl RAM: ~8GB. I do a group_agg like this:
#summary_df is a multi index pandas df with 76k rows, 20 cols
index_names = list(summary_df.index.names)
strfmt = '%Y-%m-%d'
vdf['_Period'] = vdf['Date'].dt.strftime(strfmt)
gd_column_ops_map = {
'PnL % Capital':'sum', 'PnL':'sum', '% High':'mean',
'% Close':'mean', '% Low':'mean', 'Charges':'sum', 'Sell Val':'sum', 'Buy Val':'sum',
'Qty':'sum', 'Cash Flow':'sum'
}
grpby_cols = index_names + ['_Period']
>> [Kernel CRASHES in next line after grpby happens perhaps in agg]
grp_trades_vdf = vdf.groupby(grpby_cols, progress=True).agg(gd_column_ops_map)
Description First thank you guys for this wonderful library. It does many pd operations pretty well given mem constraints (except maybe cumsum() which i am eagerly waiting.) I have a arrow file ~8GB which i load in vaex df of shape: (27_416_244, 32). System avlbl RAM: ~8GB. I do a group_agg like this:
Software information