h2oai / db-benchmark

reproducible benchmark of database-like ops
https://h2oai.github.io/db-benchmark
Mozilla Public License 2.0
320 stars 85 forks source link

de-serialization cost? #228

Closed alanpaulkwan closed 2 years ago

alanpaulkwan commented 3 years ago

Hi, I love the h2oai benchmarks. I think they're informative, but these are in-memory tests. I wonder if they're fair since a lot of the solutions are intended for larger-than-memory highly use cases and leverage the storage model. Is there a way to factor that into the benchmarks? I imagine this would require some re-design of the tests : R / Python would probably need to use feather or parquet as opposed to .csv for example.

jangorecki commented 3 years ago

It is already partially included. If you navigate to join 1e9 you will see. We want to add 500GB groupby as well, without extending machine memory, the it will be also visible on groupby task.