Hi, I love the h2oai benchmarks. I think they're informative, but these are in-memory tests. I wonder if they're fair since a lot of the solutions are intended for larger-than-memory highly use cases and leverage the storage model. Is there a way to factor that into the benchmarks? I imagine this would require some re-design of the tests : R / Python would probably need to use feather or parquet as opposed to .csv for example.
It is already partially included. If you navigate to join 1e9 you will see. We want to add 500GB groupby as well, without extending machine memory, the it will be also visible on groupby task.
Hi, I love the h2oai benchmarks. I think they're informative, but these are in-memory tests. I wonder if they're fair since a lot of the solutions are intended for larger-than-memory highly use cases and leverage the storage model. Is there a way to factor that into the benchmarks? I imagine this would require some re-design of the tests : R / Python would probably need to use feather or parquet as opposed to .csv for example.