Open yorickpeterse opened 2 years ago
Worth mentioning: often these micro benchmarks end up measuring completely unrelated code, such as the time it takes to write to STDOUT (something the Wren benchmarks suffer from). When adopting existing benchmarks we should make sure we're actually measuring what matters, instead of blindly copying the benchmarks.
To measure the impact of changes on Inko's performance, we need a benchmark suite. The benchmark suite would exist as a separate repository
The benchmark suite should consist of two types of benchmarks: micro benchmarks and macro benchmarks. A micro benchmark would be something like DeltaBlue (https://github.com/wren-lang/wren/blob/main/test/benchmark/delta_blue.py), while a macro benchmark would be something like a simple HTTP server. Ideally we choose a set of micro benchmarks also used by other languages, so we can use them to see how well we're doing compared to other languages.
Automatic benchmarking
A CI job would run the suite periodically (e.g. once a week). The results should be presented somewhere that's easily accessible. Running this on GitLab's shared runners is likely to give inconsistent results, so we probably need a dedicated runner for this. As I don't have any spare computer I can run at home 24/7, this would likely involve renting a server. A quick look at Hetzner suggests this would cost around €50/month. To recoup the costs we should probably reuse the runner for other jobs (e.g. FreeBSD tests using QEMU), otherwise it's a bit of a waste of money.