quarkusio / quarkus

Quarkus: Supersonic Subatomic Java.
https://quarkus.io
Apache License 2.0
13.87k stars 2.71k forks source link

Rework the way we currently detect regressions in build time metrics #40076

Open zakkak opened 7 months ago

zakkak commented 7 months ago

Description

https://github.com/quarkusio/quarkus/pull/36108 was an attempt to detect regressions in native builds when certain metrics are outside a given range. Unfortunately this doesn't seem to work well in practice. The main reason seems to be that multiple PRs gradually increase the metrics without hitting the threshold. Then a new PR that happens to increase the metrics a bit more triggers a failure. Although this PR might not be responsible for the total increase (that resulted in hitting the threshold) it is the one being blocked.

Implementation ideas

A thought we had within the mandrel team (cc @Karm @jerboaa) and we are working towards it is the following.

We would like to start collecting data from Quarkus CI runs (initially from runs on main and lately probably from PRs as well). This will allow us to observe the change over time (as show in https://github.com/quarkusio/quarkus/issues/39674#issuecomment-2026079349) instead of just when we hit a threshold.

Next we would ideally like to feed these data to a tool with anomaly detection (possibly https://horreum.hyperfoil.io/) in order to get automated alerts when something seems wrong. That could be:

  1. Create a generic GH issue when we have crossed a threshold from the last known "good state"
  2. Create a PR specific issue or comment in an open PR if it appears to be causing a sudden increase in the metrics we are interested in.

Related PRs:

  1. https://github.com/Karm/collector/pull/23
  2. https://github.com/quarkusio/quarkus/pull/39784
quarkus-bot[bot] commented 7 months ago

/cc @Karm (mandrel), @ebullient (metrics), @galderz (mandrel), @jmartisk (metrics)

maxandersen commented 6 months ago

Go for it! Lets spot those frogs boiling earlier :)

dmlloyd commented 6 months ago

I love this idea!

yrodiere commented 6 months ago

So IIUC, the same checks, but with alerts that won't block merging, and a nice infra with lots of history to help investigation?

Sounds great, +1 :)

zakkak commented 2 months ago

Status update:

:heavy_check_mark: Data are being collected from CI runs on main (see https://github.com/quarkusio/quarkus/pull/39784) ✔️ Integration with Horreum https://github.com/Hyperfoil/Horreum/pull/1703 :calendar: Automated reporting of anomalies waiting for Horreum new release to be deployed in production