Open ngandrewhh opened 6 months ago
Can you please provide a minimal working example? The given example doesn't run. Preferably one without pandas.
Hi @orlp , can you uncomment these lines below? Those marked with leaks introduce the problem.
# leaks
# out[col] = pl_data.select(col)
# out[col] = pl_data.get_column(col).to_frame()
# no leak
# out[col] = pl_data.select(col).to_pandas()
# out[col] = pl_data.get_column(col).to_pandas()
# leaks
# return pl.concat(out.values(), how='horizontal').to_pandas().set_index(['TS', 'ID'])
# no leak
# return pd.concat(out.values(), axis=1).set_index(['TS', 'ID'])
Converting to pandas is a way I attempt to tell the system that we are done with the polars object and they should be deallocated unless there are better ways of doing so, please let me know.
Checks
Reproducible example
Log output
No response
Issue description
Memory saturates in Windows but possibly leaks in Linux.
While I have read about memory allocator, interestingly when we convert back to pandas df, no memory increase is observed. Memory increase is seen when performing pl.concat(..., how='horizontal') on pl.select(...) or pl.get_column(...).to_frame(). I have read some discussion on the memory allocators on relevant threads, but not sure entirely relevant here.
Windows: hover around 140 MB for conversion to and from pandas, hover around 180 MB for polars Linux: hover around 140 MB for conversion to and from pandas, goes up to 500 / 600 MB for polars and gradually increasing
Expected behavior
Polars should take less memory than Pandas at idle, and memory footprint should not be increasing.
Installed versions
Windows
Linux