h2oai / db-benchmark

reproducible benchmark of database-like ops
https://h2oai.github.io/db-benchmark
Mozilla Public License 2.0
321 stars 85 forks source link

Change table order in Julia #158

Closed bkamins closed 3 years ago

bkamins commented 3 years ago

@jangorecki when looking at the benchmarks for join I have noticed that for different packages sometimes a different order of tables passed is used. In DataFrames.jl currently this order actually matters (when joining the "small" table should go first).

So my questions are:

  1. would it be possible to run the benchmark against this PR and get information about the results?
  2. would you consider changing this order?

Thank you!

jangorecki commented 3 years ago

Both yes, but unfortunately it can take some time. I don't have a workstation for a couple weeks and cannot refresh dbb environment.

bkamins commented 3 years ago

Sure - this can wait. I am also curious what the results will be.

Additionally - is there an instruction somewhere how to reproduce the test datasets? When I simply tried to run https://github.com/h2oai/db-benchmark/blob/master/_data/join-datagen.R I get:

Error in sample.int(length(x), size, replace, prob) :
  invalid first argument
Calls: data.table -> sample_all -> sample -> sample -> sample.int
Execution halted

I am on data.table v1.13.0

bkamins commented 3 years ago

I close it for now as I managed to run the tests locally. I will try to make code run fast on the original cases.