Replace homegrown benchmark CI runner with codspeed

rukai commented 3 months ago

Shotover has always had a homegrown system for reporting benchmark regressions in CI. For every PR: It runs the benchmarks for the main branch, then runs the benchmarks for the PRs branch, then compares the results, reporting any large difference in results. It then posts the analysis in a comment like this https://github.com/shotover/shotover-proxy/pull/1495#issuecomment-1958780353

However, the UX has been lacking:

Have to sift through noisy criterion logs to find results
Only comments when it detects a regression/improvement, this means the system not running and the system saying :+1: are indistinguishable.

And maintaining it has been a pain. In fact, right now the system is silently broken after an update to one of our github actions dependencies.

Recently various services that will manage all of this for us has appeared. The most promising looking is https://codspeed.io

Here I've enabled it in a test repo https://github.com/rukai/shotover-proxy/pull/1 It posts a comment that links to https://codspeed.io/rukai/shotover-proxy/branches/codspeed_pr which presents all the results really nicely. It claims to significantly reduce benchmark noise, but its not clear how exactly they are achieving it. From the results on the page I linked, it does look like the results are largely noise free! The page also allows inspecting a flamegraph of each benchmark which is impressive and will be quite handy.

One downside is that we need to use a fork of criterion to enable codspeed. But if we decide codspeed is not right for us then we can really easy revert back if needed, since codspeed does not change the API at all in its criterion fork. So I propose we replace our homegrown system with codspeed, which this PR implements.

benbromhead commented 3 months ago

I'm generally a fan of adopting an external service / capability for something that is not core to the project that brings in a net win in development velocity. Sounds like we also have an easy out if it's not living up to its promises.

Do you have an example (artificial or not) of what a real regression looks like (e.g. throwing a sleep into that example PR)?

+1

rukai commented 3 months ago

well that was enlightening, I added a 1s sleep and they claim it did not regress performance, there is a -8% decrease to that bench but I expected something in the order of -1000% https://github.com/rukai/shotover-proxy/pull/2

It would seem they are doing some kind of instruction counting and thats why the sleep is largely ignored. This is not ideal but its also acceptable and realistic, if you are going to measure microbenchmark performance on a noisy CI cloud machine you will need to pull some tricks like that.

codspeed-hq[bot] commented 3 months ago

CodSpeed Performance Report

Congrats! CodSpeed is installed 🎉

🆕 37 new benchmarks were detected.

You will start to see performance impacts in the reports once the benchmarks are run from your default branch.

Detected benchmarks

- `cassandra_protect_protected` (225 µs) - `cassandra_protect_unprotected` (114.4 µs) - `cassandra_request_throttling_unparsed` (22.1 µs) - `cassandra_rewrite_peers_passthrough` (244.3 µs) - `loopback` (9 µs) - `nullsink` (21.7 µs) - `redis_cluster_ports_rewrite` (28.5 µs) - `redis_filter` (41.2 µs) - `decode_system.local_query_v4_lz4_compression` (139.3 µs) - `decode_system.local_query_v4_no_compression` (137.8 µs) - `decode_system.local_query_v5_lz4_compression` (140.3 µs) - `decode_system.local_query_v5_no_compression` (139.9 µs) - `decode_system.local_result_v4_lz4_compression` (173.5 µs) - `decode_system.local_result_v4_no_compression` (25.2 µs) - `decode_system.local_result_v5_lz4_compression` (173.6 µs) - `decode_system.local_result_v5_no_compression` (25.3 µs) - `encode_system.local_query_v4_lz4_compression` (24 µs) - `encode_system.local_query_v4_no_compression` (24.9 µs) - `encode_system.local_query_v5_lz4_compression` (64.8 µs) - `encode_system.local_query_v5_no_compression` (28.7 µs) - `encode_system.local_result_v4_lz4_compression` (991.2 µs) - `encode_system.local_result_v4_no_compression` (443.1 µs) - `encode_system.local_result_v5_lz4_compression` (996.4 µs) - `encode_system.local_result_v5_no_compression` (491.1 µs) - `decode_request_fetch_create` (18.4 µs) - `decode_request_fetch_drop` (6.1 µs) - `decode_request_list_offsets_create` (16.5 µs) - `decode_request_list_offsets_drop` (5.8 µs) - `decode_request_metadata_create` (14.5 µs) - `decode_request_metadata_drop` (4.5 µs) - `decode_request_produce_create` (21.1 µs) - `decode_request_produce_drop` (6.3 µs) - `encode_all` (49.3 µs) - `encode_request_fetch` (25.6 µs) - `encode_request_list_offsets` (23.4 µs) - `encode_request_metadata` (21.8 µs) - `encode_request_produce` (26.1 µs)

rukai commented 3 months ago

Adding a more realistic regression with a .clone() is caught: https://github.com/rukai/shotover-proxy/pull/3

This flamegraph diff of the bench shows exactly were the regression was introduced: So cool!

shotover / shotover-proxy