Open zakkak opened 7 months ago
/cc @Karm (mandrel), @ebullient (metrics), @galderz (mandrel), @jmartisk (metrics)
Go for it! Lets spot those frogs boiling earlier :)
I love this idea!
So IIUC, the same checks, but with alerts that won't block merging, and a nice infra with lots of history to help investigation?
Sounds great, +1 :)
Status update:
:heavy_check_mark: Data are being collected from CI runs on main (see https://github.com/quarkusio/quarkus/pull/39784) ✔️ Integration with Horreum https://github.com/Hyperfoil/Horreum/pull/1703 :calendar: Automated reporting of anomalies waiting for Horreum new release to be deployed in production
Description
https://github.com/quarkusio/quarkus/pull/36108 was an attempt to detect regressions in native builds when certain metrics are outside a given range. Unfortunately this doesn't seem to work well in practice. The main reason seems to be that multiple PRs gradually increase the metrics without hitting the threshold. Then a new PR that happens to increase the metrics a bit more triggers a failure. Although this PR might not be responsible for the total increase (that resulted in hitting the threshold) it is the one being blocked.
Implementation ideas
A thought we had within the mandrel team (cc @Karm @jerboaa) and we are working towards it is the following.
We would like to start collecting data from Quarkus CI runs (initially from runs on
main
and lately probably from PRs as well). This will allow us to observe the change over time (as show in https://github.com/quarkusio/quarkus/issues/39674#issuecomment-2026079349) instead of just when we hit a threshold.Next we would ideally like to feed these data to a tool with anomaly detection (possibly https://horreum.hyperfoil.io/) in order to get automated alerts when something seems wrong. That could be:
Related PRs: