voltrondata-labs / arrowbench

R package for benchmarking
Other
13 stars 9 forks source link

TPC-H benchmarks 11-22 #50

Closed jonkeane closed 2 years ago

jonkeane commented 3 years ago

TPC-H spec The queries start on page 29. They are also included in a slightly more readable form at https://github.com/duckdb/duckdb/tree/master/extension/tpch/dbgen/queries

Our current implementations of 1-10

paleolimbot commented 3 years ago

Commenting so it shows up on my GitHub issue list!

jonkeane commented 3 years ago

Here's an example command to run one of these (most of the parameters have documentation in the readme, though we have a few that need more):

library(arrowbench)

withr::with_envvar(
  # ARROWBENCH_LOCAL_DIR is where to store the input data files (after generating
  # them) and the results, and any packages-to-be-benchmarked (see `lib_path` below)
  list(ARROWBENCH_LOCAL_DIR = "~/repos/ab_store/het"),
  {
    results <- run_benchmark(
      tpc_h,
      scale_factor = 1,
      query_id = 1,
      # cpu_count defaults to both 1 and max available, but feel free to set otherwise
      cpu_count = 8,
      format = c("native", "parquet"),
      # This is under-documented, but this string will actually install a whole
      # separate arrow package so it doesn't collide with your develeopment
      # version. Alternatively you can specify lib_path = "latest" which will
      # use whatever arrow version is installed in the .libPath your session is 
      # pointing to. 
      lib_path = "remote-apache/arrow@HEAD",
      engine = "arrow",
      n_iter = 1
    )

    all_results <- as.data.frame(results)
  }
)