[GOAL] Benchmarking Workflows

Latency to initial display (of anything useful)
Latency for interaction updates (pan, zoom, or scroll/scrub). The type of interaction that we want to prioritize depends on the workflow. For a stacked timeseries viewer, either a zoom out or pan is ideal. For an imagestack viewer, scrolling/scrubbing through the frames is ideal.

Dataset size (e.g. for stacked timeseries, number of channels or samples; for an imagestack, number of frames or frame size)
Underlying software changes, including a specific commit/version of any relevant package. (e.g. Param 1.X vs 2.X, or before and after a bug fix to HoloViews). This is the closest thing to what ASV usually would test for over time, but now for a whole set of relevant packages and for specific cherry-picked comparisons. This would require a log/env of commit/versions used per benchmarking run.
Workflow approach employed (e.g. stacked timeseries using HoloViews downsample options: LTTB vs MinMaxLTTB vs viewport. A second example is utilizing a numpy array vs pandas df vs xarray da as the data source for a holoviews element. A third example is using hvplot vs holoviews to produce a similar output. A fourth example is using Bokeh Canvas vs WebGL rendering). This would require a manual description about the benchmarking run.

The results of the benchmarking need to be reproducible - storing a copy of the benchmarks, info about the environment used, manual notes about the approach, info about the machine that it was run on.
The timing results need to be in a format amenable to a comparison.. (e.g. show the latency to display as a function of the number of samples for the stacked timeseries workflow when employing no downsampling vs MinMaxLTTB downsampling)

Incorporate into the CI
Put benchmarking stuff in a separate repo (it's totally fine to do that now if you want, but not expected)

Bokeh 'FigureView actual paint' messages that capture figure_id, state (start, end), and render count
Playwright

holoviz-topics / neuro