google / syzkaller

syzkaller is an unsupervised coverage-guided kernel fuzzer
Apache License 2.0
5.4k stars 1.23k forks source link

dashboard/app: better health monitoring #5428

Open dvyukov opened 1 month ago

dvyukov commented 1 month ago

We have lots of health indicators that can be evaluated only over a time period, for example:

All of these can't be diagnosed at the instant (a single dashboard error, or a single repro failure may be ignored), and currently we don't do any monitoring for any of these (besides random wandering around).

We should collect data for these and at least visualize (e.g. rate of successful/failed bug reproductions, dashboard errors per day), and ideally maybe alert on sudden changes. Some alerts may be based on threshold (easier, e.g. >100 dashboard errors/day).

a-nogikh commented 1 month ago

GCP offers the alerting functionality: https://cloud.google.com/monitoring/alerts (also based on the logs https://cloud.google.com/logging/docs/alerting/log-based-alerts)

It would be nice to figure out how to keep these settings in the git repository and be able to (re-)deploy them without having to go through the Cloud web UI interface.

dvyukov commented 1 month ago

If we want to reply on logs grepping, I think for reliability of parsing we will need a new interface along the lines of:

package log
func Metricf(typ Metric, description string, args ...any)
type Metric string

Otherwise matching logs is unreliable.

And, yes, it would be good to persist rules somewhere.