Closed jonkeane closed 2 years ago
Commenting so it shows up on my GitHub issue list!
Here's an example command to run one of these (most of the parameters have documentation in the readme, though we have a few that need more):
library(arrowbench)
withr::with_envvar(
# ARROWBENCH_LOCAL_DIR is where to store the input data files (after generating
# them) and the results, and any packages-to-be-benchmarked (see `lib_path` below)
list(ARROWBENCH_LOCAL_DIR = "~/repos/ab_store/het"),
{
results <- run_benchmark(
tpc_h,
scale_factor = 1,
query_id = 1,
# cpu_count defaults to both 1 and max available, but feel free to set otherwise
cpu_count = 8,
format = c("native", "parquet"),
# This is under-documented, but this string will actually install a whole
# separate arrow package so it doesn't collide with your develeopment
# version. Alternatively you can specify lib_path = "latest" which will
# use whatever arrow version is installed in the .libPath your session is
# pointing to.
lib_path = "remote-apache/arrow@HEAD",
engine = "arrow",
n_iter = 1
)
all_results <- as.data.frame(results)
}
)
TPC-H spec The queries start on page 29. They are also included in a slightly more readable form at https://github.com/duckdb/duckdb/tree/master/extension/tpch/dbgen/queries
Our current implementations of 1-10