First pass at adding benchmarks in CI.
Covers matrix of key archs and SIMD extensions.
Outputs delta against main with critcmp (with 5% noise threshold).
Future improvements
Run classifier on results to indicate regression/improvement/neutral (aggregate or on specific arch)
Post summary (or details if regression) as PR comment (didn't add cause without summary would be noisy)
Could save main baselines as assets (and just ref/download them) instead of recomputing them for each PR
(comes at a cost of statefulness and some adhoc complexity but would reduce CI time cost)
Notes
This adds CI (compute) time, but don't think it adds wall time since there are longer subtasks
CI is still more noisy than controlled local env, but probably good enough for substantial perf changes and sanity-checking
First pass at adding benchmarks in CI. Covers matrix of key archs and SIMD extensions. Outputs delta against
main
withcritcmp
(with 5% noise threshold).Future improvements
main
baselines as assets (and just ref/download them) instead of recomputing them for each PR (comes at a cost of statefulness and some adhoc complexity but would reduce CI time cost)Notes