Open alistaire47 opened 2 years ago
Failing test is unrelated; something about duckdb + tpch:
══ Failed tests ════════════════════════════════════════════════════════════════
── Error ('test-custom-duckdb.R:26'): custom DuckDB can be installed to and used from a custom lib ──
Error in `ensure_custom_duckdb(tf, install = FALSE)`: An unexpected error occured whilst querying TPC-H enabled duckdb
Caused by error:
! error in callr subprocess
Caused by error:
! rapi_prepare: Failed to prepare query select scale_factor, query_nr from tpch_answers() LIMIT 1;
Error: Error: Catalog Error: Function with name tpch_answers is not on the catalog, but it exists in the tpch extension. To Install and Load the extension, run: INSTALL tpch; LOAD tpch;
Backtrace:
▆
1. ├─testthat::expect_error(...) at test-custom-duckdb.R:26:2
2. │ └─testthat:::expect_condition_matching(...)
3. │ └─testthat:::quasi_capture(...)
4. │ ├─testthat (local) .capture(...)
5. │ │ └─base::withCallingHandlers(...)
6. │ └─rlang::eval_bare(quo_get_expr(.quo), quo_get_env(.quo))
7. └─arrowbench:::ensure_custom_duckdb(tf, install = FALSE)
8. └─base::tryCatch(...)
9. └─base (local) tryCatchList(expr, classes, parentenv, handlers)
10. └─base (local) tryCatchOne(expr, names, parentenv, handlers[[1L]])
11. └─value[[3L]](cond)
12. └─rlang::abort(...)
Probably this will go away with the switch to datalogistik?
Mocks up an iterative benchadapt adapter using
GeneratorAdapter
from https://github.com/conbench/conbench/pull/406 to call arrowbench benchmarks one at a time. Changes:BenchmarkResult
to allow it to output JSON with fields arranged appropriately forbenchadapt.BenchmarkResult
(and therefore the conbench API). A little fast-and-dirty; there's work to be done to make sure we're populating everything what we want to, since all of this was getting populated by voltrondata_labs/benchmarks and the conbench runner before.read_file
andwrite_file
to match the defaults in labs/benchmarks.names(known_sources)
was working fine except fortpch
, for which the benchmarks don't have additional code to iterate over tables.inst/arrowbench
that can: a. produce a JSON list of dicts of args suitable for running a case viarun_one()
, e.g.{"bm": "read_file", "source": "fanniemae_2016Q4", "format": "parquet", "compression": "uncompressed", "output": "arrow_table", "cpu_count": 1}
b. run a case withrun_one()
when passed a dict of args like the list command produces and cat the cleaned JSON resultsGeneratorAdapter
. It is initialized with a path to the CLI created in 3., and when run, hits the CLI to list benchmark cases, then iterates through them (subset to the first 10 for demo purposes), runs each, parses the result JSON and inserts it into aBenchmarkResult
object, and yields that. When called, the adapter will post each result before moving on to the next iteration.This needs careful polish before replacing labs/benchmarks for R benchmark running to ensure that our metadata is consistent and won't break histories, but does show how R benchmarks could work with adapters.