[BUG-REPORT] - Githubissues

Description I'm currently working with a large dataset (1.6mil) read in from a hdf5 file via vx.open() which i'm then applying columns filters to, as well as a column selection. However, when performing aggregations I get a MemoryError: bad allocation error Filtering done via: df = df[df[filter_col].isin(filter_values)] Column selection done via: df = df[list_of_columns] Aggregation done via: df = df.groupby(by=group_by_cols, agg={col: vx.agg.sum(col) for col in cols_to_aggregate}

All of this works perfectly fine when working with a smaller datatset (500k rows). Additionally, first applying the column selection, then aggregations, and finally filters also works fine however this seems to hinder performance a good amount as aggregations are performed on the whole df as a pose to a filtered down version.

Essentially this issue seems to occur when attempting to perform aggregations on a large df thats been filtered beforehand

Software information

Vaex version (import vaex; vaex.__version__): 4.16.0
Vaex was installed via: pip
OS: Windows 10

vaexio / vaex

[BUG-REPORT] #2309