Add a top-level runner for sets of benchmarks

alistaire47 commented 1 year ago

This PR adds a top-level runner for managing and running sets of benchmarks at once, an important step in moving towards arrow-benchmarks-ci calling arrowbench directly instead of through voltrondata-labs/benchmarks.

The key addition of this PR is a S3-classed dataframe called BenchmarkDataFrame that contains at least

benchmark name (the attribute that will be the name tag, not the object name)
the benchmark object itself in a list column
a list column of parameter dataframes (which can be NULL, in which case they'll be populated with default_params())

After they're run, a results column will be appended, which is a list column where each element is an instance of BenchmarkResults containing an instance of BenchmarkResult for each case (row) in the parameter dataframe.

Because it's just a dataframe (and a tibble, mostly for nice list-column printing), users can filter how they like, modify the parameters column however they see fit prior to running, and expand and parse results at their leisure after running, and everything will be kept associated properly because everything related is in a row.

Thus far (as of writing; this may change by the time this gets merged), I've resisted the urge to add a bunch of helper functions to, say, unnest parameters and results in parallel or some other version of get_params_summary() or whatever. I'm open to thoughts on what's useful here; there's a lot that could be added, but quite a lot is already possible with a line of code or two.

This PR also adds an exported vector named URSA_I9_9960X_R_BENCHMARK_NAMES, which corresponds exactly with this list in arrow-benchmarks-ci of the list of R benchmarks run regularly. (Several machines also run tpch alone.) I'm not sure this is the best way to manage these, but I'm very open to suggestions; I know we do want to manage this list from arrowbench, but we haven't thus far said exactly how. We could stick it in a list as a config or make a little function that will return a set of benchmark names for a machine name or something? There are options.

This PR does not yet add code for posting results to Conbench; that will be in the next PR. You can see where it will fit, though; we could post results as they're completed with run() or a finished set of results from a BenchmarkDataFrame.

A bit of what this looks like:

> get_package_benchmarks()
# <BenchmarkDataFrame>
# A tibble: 14 × 3
   name                         benchmark    parameters
 * <chr>                        <named list> <list>    
 1 dataset_taxi_2013            <Benchmrk>   <NULL>    
 2 row_group_size               <Benchmrk>   <NULL>    
 3 read_csv                     <Benchmrk>   <NULL>    
 4 write_csv                    <Benchmrk>   <NULL>    
 5 read_json                    <Benchmrk>   <NULL>    
 6 remote_dataset               <Benchmrk>   <NULL>    
 7 file-write                   <Benchmrk>   <NULL>    
 8 dataframe-to-table           <Benchmrk>   <NULL>    
 9 table_to_df                  <Benchmrk>   <NULL>    
10 array_to_vector              <Benchmrk>   <NULL>    
11 partitioned-dataset-filter   <Benchmrk>   <NULL>    
12 file-read                    <Benchmrk>   <NULL>    
13 tpch                         <Benchmrk>   <NULL>    
14 array_altrep_materialization <Benchmrk>   <NULL>  

> get_package_benchmarks() %>% filter(name %in% URSA_I9_9960X_R_BENCHMARK_NAMES)
# <BenchmarkDataFrame>
# A tibble: 5 × 3
  name                       benchmark    parameters
  <chr>                      <named list> <list>    
1 file-write                 <Benchmrk>   <NULL>    
2 dataframe-to-table         <Benchmrk>   <NULL>    
3 partitioned-dataset-filter <Benchmrk>   <NULL>    
4 file-read                  <Benchmrk>   <NULL>    
5 tpch                       <Benchmrk>   <NULL>  

> bm_list <- list(arrowbench:::placebo, arrowbench:::placebo)
> param_list <- list(
+     default_params(
+         arrowbench:::placebo,
+         error = list(NULL, "rlang::abort", "base::stop"),
+         cpu_count = arrow::cpu_count()
+     ),
+     NULL
+ )
> bm_df <- BenchmarkDataFrame(benchmarks = bm_list, parameters = param_list)
> bm_df
# <BenchmarkDataFrame>
# A tibble: 2 × 3
  name    benchmark  parameters  
* <chr>   <list>     <list>      
1 placebo <Benchmrk> <df [3 × 4]>
2 placebo <Benchmrk> <NULL>      

> bm_df_res <- run(bm_df)

> bm_df_res
# <BenchmarkDataFrame>
# A tibble: 2 × 4
  name    benchmark  parameters   results   
  <chr>   <list>     <list>       <list>    
1 placebo <Benchmrk> <df [3 × 4]> <BnchmrkR>
2 placebo <Benchmrk> <df [2 × 3]> <BnchmrkR>

alistaire47 commented 1 year ago

Am I interpreting correctly that effectively what we would need to add to arrow-benchmarks-ci (after we add the publishing, of course) is something like:
# prior to executing this r script, ensure that arrowbench is installed + up to date + the arrow R package is too
library(arrowbench)

get_package_benchmarks() %>% filter(name %in% URSA_I9_9960X_R_BENCHMARK_NAMES) %>% run()
Is that right? If so, I wonder if it might make sense to have the list URSA_I9_9960X_R_BENCHMARK_NAMES be explicit and in the arrow-benchmarks-ci repo itself (e.g. get_package_benchmarks() %>% filter(name %in%c("dataframe-to-table", "df_to_table", "file-read", "file-write", "partitioned-dataset-filter", "tpch")) %>% run())? Is there any other reason we might need the list of benchmarks scheduled on the ursa i9 to live inside of arrowbench?

Yep, that's roughly the intended pattern. I thought about adding a CLI or some other sort of simplified entrypoint, but this seems simple enough it's not really worth the effort.

As to where that vector lives, yes, that's the part of this PR I'm least certain of. I thought we wanted to move that out of arrow-benchmarks-ci into arrowbench, but the more I look at it, the more I think it should stay where it is, and we just cut out labs/benchmarks in the middle. It may be useful to keep this around for testing, but I may at least un-export it before merging if I don't delete it entirely.

This PR does not yet add code for posting results to Conbench; that will be in the next PR. You can see where it will fit, though; we could post results as they're completed with run() or a finished set of results from a BenchmarkDataFrame.

I assume you'll also add in the cache dropping in the follow on PR as well so that we have similar results across the two runners? For now I would "just" copy that code over and implement it in arrowbench. Ideally it's part of some dependency-lite utility set that we could depend on — but I don't think we've yet to find a way to truly have an easy way to do that and it's 1(ish) bash command that copying around is the least-painful way to get to where we need to go.

Ah right. I'm of multiple minds here; we could insert it into benchconnect, but a. it doesn't quite fit with the rest of what's there, and b. I think we want to call that as close to running as possible, and wrapping it in a CLI pushes it further. And wrapping it in a CLI still wouldn't standardize whether it's called per iteration, or before a separate warmup iteration, or once per run anyway—that would still depend on the runner. So yeah, I guess the least-bad is to copy this all over. Created #126 to do it; it should probably be its own little tiny PR, not part of the benchconnect mess, as it's unrelated.

I do wonder if it might worthwhile having an example (beyond what is in the tests) in the README or something.

Yeah, if this is going to be the way to run a bunch of benchmarks, it should be in the README, if nothing else so it's clear how to run benchmarks in CI without looking at arrow-benchmarks-ci. I'll add a bit.

jonkeane commented 1 year ago

All of that sounds good, and thanks for the follow on issue for cache dropping.

but the more I look at it, the more I think it should stay where it is [in arrow-benchmarks-ci], and we just cut out labs/benchmarks in the middle

I very much agree with this! If things like this are better off in arrow-benchmarks-ci there's nothing wrong with putting them there. But getting rid of the awkward dependency on labs/benchmarks in the middle is 💯

voltrondata-labs / arrowbench

Add a top-level runner for sets of benchmarks #122