h2oai / db-benchmark

reproducible benchmark of database-like ops
https://h2oai.github.io/db-benchmark
Mozilla Public License 2.0
323 stars 85 forks source link

[WIP] Don't use gc in pandas join benchmark. #84

Closed trivialfis closed 5 years ago

trivialfis commented 5 years ago
jangorecki commented 5 years ago

Pandas join benchmark is out of date. Initially, in 2016, this benchmark was testing mostly distributed tools + pandas, data.table, dplyr. If you are interested in those you may want to check this talk https://www.youtube.com/watch?v=5X7h1rZGVs0 Join scripts are left from those old times, except for the data.table, where I am drafting new join benchmark now. The old was very limited. Up to date info on join is in https://github.com/h2oai/db-benchmark/issues/18. I am aware of gc, it is the same in R, but for consistency is included in various tools. Using wrapper functions in benchmark scripts was discussed in https://github.com/h2oai/db-benchmark/pull/41 Closing this, please discuss the feature/changes you want to push to avoid unnecessary work. Thanks for contributing!

trivialfis commented 5 years ago

@jangorecki Thanks for the explanation.