RFC: Make benchmarks run for longer

As discussed I've made the benchmark run for longer, 5 seconds minimum.

Before change (with a default 1 second):

Measure-Command { pytest src/bench/ --benchmark-name=short --benchmark-sort=name --benchmark-group-by=group,func --benchmark-autosave --benchmark-columns=median,iqr,outliers,mean,stddev,min,max,iterations,rounds --modulepath=examples/dl-activations/gelu --benchmarkname=vgelu | Out-Default }

TotalSeconds      : 37.430946

------------------------------------------------------------------------- benchmark 'torch.Size([4]) test_inference': 3 tests --------------------------------------------------------------------------
Name (time in us)                                          Median               IQR            Outliers     Mean             StdDev                Min                 Max            Iterations  Rounds
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
inference[vgelu_pytorch-Knossos-torch.Size([4])]          16.2000 (1.0)      5.9000 (3.93)      542;644  19.5269 (1.0)      10.3934 (1.36)     13.1000 (1.0)      176.2000 (1.0)               1    4667
inference[vgelu_pytorch-PyTorch CUDA-torch.Size([4])]     92.1000 (5.69)     2.2000 (1.47)      208;333  97.2411 (4.98)     16.1253 (2.11)     90.1000 (6.88)     250.4000 (1.42)              1    2288
inference[vgelu_pytorch-PyTorch-torch.Size([4])]          19.2000 (1.19)     1.5000 (1.0)      876;1345  20.9317 (1.07)      7.6320 (1.0)      17.3000 (1.32)     184.1000 (1.04)              1   14685
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

With change:

Measure-Command { pytest src/bench/ --benchmark-name=short --benchmark-max-time=5.0 --benchmark-sort=name --benchmark-group-by=group,func --benchmark-autosave --benchmark-columns=median,iqr,outliers,mean,stddev,min,max,iterations,rounds --modulepath=examples/dl-activations/gelu --benchmarkname=vgelu | Out-Default }

TotalSeconds      : 83.5360679

---------------------------------------------------------------------------- benchmark 'torch.Size([4]) test_inference': 3 tests -----------------------------------------------------------------------------
Name (time in us)                                          Median                IQR              Outliers     Mean             StdDev                Min                    Max            Iterations  Rounds
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
inference[vgelu_pytorch-Knossos-torch.Size([4])]          14.0000 (1.0)       1.1000 (1.0)       1285;2608  15.3760 (1.0)       6.0123 (1.0)      12.7000 (1.0)         182.4000 (1.0)               1   25446
inference[vgelu_pytorch-PyTorch CUDA-torch.Size([4])]     90.6000 (6.47)     12.7000 (11.55)      228;1319  99.8368 (6.49)     93.2841 (15.52)    79.2000 (6.24)     10,221.7000 (56.04)             1   12857
inference[vgelu_pytorch-PyTorch-torch.Size([4])]          18.5000 (1.32)      2.0000 (1.82)     5692;13969  20.5039 (1.33)      7.0913 (1.18)     16.6000 (1.31)        201.6000 (1.11)              1   83057
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

n.b. the time measurement breaks the colours, so you get monochrome for this

The longer running time seem more likely to hit the really long outliers, see CUDA Max.

Happy to add this, it's going to be more important for the longer runs others are doing.

microsoft / knossos-ksc

RFC: Make benchmarks run for longer #894