metal3-io / project-infra

Metal3 testing infrastructure configuration
https://prow.apps.test.metal3.io
Apache License 2.0
17 stars 20 forks source link

Prow: Monitoring and alerting #896

Open lentzi90 opened 2 hours ago

lentzi90 commented 2 hours ago

It would be nice to have a monitoring solution for Prow. This would help us set more precise resource requests, send out alerts if there are issues and track metrics exposed by prow itself also. A good starting place is probably the monitoring solution they have in test-infra: https://github.com/kubernetes/test-infra/blob/master/config/prow/cluster/monitoring/README.md

At minimum this issue should provide a way to track prow job resource usage, normal container and node metrics. These metrics could then be used for alerting.

lentzi90 commented 2 hours ago

/triage accepted /assign