Previous code was passably generating results but there were several shortcomings that led to frustrations--especially considering our test runs can take a fairly significant amount of time and resources to start/end.
This is the start of major work to address these shortcomings so that the majority of work on this project going forward can focus on collecting data.
Centralized run script called run_benchmarks.py. This script will be responsible for kicking off the runs, then scraping the results from the resultant ASV JSON files into a CSV format. There was no straightforward way to make ASV display results in the way we wanted, so this seems like the best option.
Benchmarks have been split into separate files across standardized names. Some of the benchmarks can take many minutes to hours to run. There is little advantage to stringing several of these long runs together and led to several significant drawbacks including failures at any point could nuke the entire run. Now, benchmarks are separated into files based on platform, storage backend, and operation. For example, gcp_kubernetes_zarr_read will contain just tests for read on Zarr datasets on GCP using Dask Kubecluster, whereas workstation_dask_read_zarr will conduct runs on a local workstation with Dask and Zarr. This also allows you to run benchmarks based on regex, which the central run script run_benchmarks.py adopts. Ergo, python run_benchmarks.py -b gcp_ will run all GCP related tests, and python run_benchmarks.py -b read would run all read tests for all platforms.
Natively, ASV, does not provide individual run results. Instead, it averages configurable runs and provides singular results for that instead. An extra parameter has been provided (run_num) that allows us to record individual runs without averaging. This is a bit hackish, but seems to work relatively well. run_num is configurable via test.conf.yaml or through the centralized run script that has been created with this PR. Hence python run_benchmarks.py -b gcp_ -n 10 runs all GCP related benchmarks a total of 10 times.
Start of various other simplifications. Removal of spaghetti code and other atrocities.
Previous code was passably generating results but there were several shortcomings that led to frustrations--especially considering our test runs can take a fairly significant amount of time and resources to start/end.
This is the start of major work to address these shortcomings so that the majority of work on this project going forward can focus on collecting data.
run_benchmarks.py
. This script will be responsible for kicking off the runs, then scraping the results from the resultant ASV JSON files into a CSV format. There was no straightforward way to make ASV display results in the way we wanted, so this seems like the best option.gcp_kubernetes_zarr_read
will contain just tests for read on Zarr datasets on GCP using Dask Kubecluster, whereasworkstation_dask_read_zarr
will conduct runs on a local workstation with Dask and Zarr. This also allows you to run benchmarks based on regex, which the central run scriptrun_benchmarks.py
adopts. Ergo,python run_benchmarks.py -b gcp_
will run all GCP related tests, andpython run_benchmarks.py -b read
would run all read tests for all platforms.python run_benchmarks.py -b gcp_ -n 10
runs all GCP related benchmarks a total of 10 times.