h2oai / db-benchmark

reproducible benchmark of database-like ops
https://h2oai.github.io/db-benchmark
Mozilla Public License 2.0
325 stars 88 forks source link

reduce memory usage for polars #224

Closed ritchie46 closed 3 years ago

ritchie46 commented 3 years ago

This PR sets low_memory to True while parsing the csv.

Furthermore we shrink the arrays after we have coerced to Categorical and we make sure that the global string cache is emptied when not needed anymore.

Hopefully, this solves the problem when loading the 50GB dataset.

jangorecki commented 3 years ago

unfortunately it didn't help for G1_1e9_1e2_0_0, rest is still running

ritchie46 commented 3 years ago

unfortunately it didn't help for G1_1e9_1e2_0_0, rest is still running

Hmm.. :slightly_frowning_face: Again killed before any question was executed?

jangorecki commented 3 years ago

yes, full output is

# groupby-polars.py
loading dataset G1_1e9_1e2_0_0
Killed
ritchie46 commented 3 years ago

Thanks. Back to the drawing board.