Below are the factors considered while running the benchmark in an autotune experiment.
- Repetability
- Convergence
- Reproducibility
Above factors are measured using the below process in an experiment:
1. Each Autotune experiment is usually composed of 100 trials.
2. Each Trial tests a specific config from HPO.
3. Each trial runs the benchmark with multiple iterations. The benchmark container gets re-deployed at the start of each iteration.
4. Each iteration in a trial includes warmup and measurement cycles. Duration of warmup cycles is based on pre-run data from the benchmark.
5. For each trial, measure convergence in the benchmark data by calculating the confidence interval by using T-distribution for each metric.
6. Calculates the min, max, mean and percentile info for the metrics.
Measurement process: https://github.com/kruize/autotune-results/pull/31
Initial discussion to generate the config: https://github.com/kruize/autotune-results/issues/28 Tunables of the experiment: https://github.com/kusumachalasani/autotune-results-1/blob/TFB_R21/techempower/experiment-16/benchmark.yaml
Initial results for the TFB Round21 for Throughput improvements: https://github.com/kruize/autotune-results/pull/30/ More Experiments with Quarkus v2.9.1.F: In progress
Configurations of baseline and Autotune:
Baseline1 Configuration:
Baseline2 Configuration :
Autotune Configuration:
Throughput chart comparing baseline and Autotune config with TFB Quarkus v2.9.1.F:
@Sanne @franz1981 @johnaohara @stalep @ddoliver @dinogun