google / syzkaller

syzkaller is an unsupervised coverage-guided kernel fuzzer
Apache License 2.0
5.35k stars 1.23k forks source link

dashboard/app: better health monitoring #5428

Open dvyukov opened 2 days ago

dvyukov commented 2 days ago

We have lots of health indicators that can be evaluated only over a time period, for example:

All of these can't be diagnosed at the instant (a single dashboard error, or a single repro failure may be ignored), and currently we don't do any monitoring for any of these (besides random wandering around).

We should collect data for these and at least visualize (e.g. rate of successful/failed bug reproductions, dashboard errors per day), and ideally maybe alert on sudden changes. Some alerts may be based on threshold (easier, e.g. >100 dashboard errors/day).

a-nogikh commented 2 days ago

GCP offers the alerting functionality: https://cloud.google.com/monitoring/alerts (also based on the logs https://cloud.google.com/logging/docs/alerting/log-based-alerts)

It would be nice to figure out how to keep these settings in the git repository and be able to (re-)deploy them without having to go through the Cloud web UI interface.

dvyukov commented 2 days ago

If we want to reply on logs grepping, I think for reliability of parsing we will need a new interface along the lines of:

package log
func Metricf(typ Metric, description string, args ...any)
type Metric string

Otherwise matching logs is unreliable.

And, yes, it would be good to persist rules somewhere.