visualfabriq / bquery

A query and aggregation framework for Bcolz (W2013-01)
https://www.visualfabriq.com
BSD 3-Clause "New" or "Revised" License
56 stars 11 forks source link

Memory Optimizations #44

Closed CarstVaartjes closed 8 years ago

CarstVaartjes commented 9 years ago

Several of the groupby algorithms (such as the group_index) can use larger amount of memory when used; this does not immediately get released. Two things to do:

1) Implement on-disk behaviour for temporary columns, most notably the group_index (nb, it is already in for the result ctable, so that's not the issue and they tend to be smaller anyway, it's more the in-between results eating the memory), update: also the group and sum columns are all numpy arrays (for good reason during updates) but not converted to carrays until the end 2) Look at what drives memory usage and see how we can optimize this; most parts of the cython should be re-using numpy arrays in loops for efficiency, so it's slightly weird to see this memory eating behaviour to be honest

CarstVaartjes commented 8 years ago

done by #72 and #73