Note that this is not denoising the individual runtimes. The point of the test is to measure a distribution of runtimes for a variety of scenarios. This PR takes more samples for each distribution, and spreads the samples out over time to dodge transitory OS stuff by interleaving different test scenarios.
I also tweaked some of the scenarios to be more useful, and reported actual deciles so that the median is actually there.
Note that this is not denoising the individual runtimes. The point of the test is to measure a distribution of runtimes for a variety of scenarios. This PR takes more samples for each distribution, and spreads the samples out over time to dodge transitory OS stuff by interleaving different test scenarios.
I also tweaked some of the scenarios to be more useful, and reported actual deciles so that the median is actually there.