Open lentzi90 opened 2 hours ago
It would be nice to have a monitoring solution for Prow. This would help us set more precise resource requests, send out alerts if there are issues and track metrics exposed by prow itself also. A good starting place is probably the monitoring solution they have in test-infra: https://github.com/kubernetes/test-infra/blob/master/config/prow/cluster/monitoring/README.md
At minimum this issue should provide a way to track prow job resource usage, normal container and node metrics. These metrics could then be used for alerting.
/triage accepted /assign
It would be nice to have a monitoring solution for Prow. This would help us set more precise resource requests, send out alerts if there are issues and track metrics exposed by prow itself also. A good starting place is probably the monitoring solution they have in test-infra: https://github.com/kubernetes/test-infra/blob/master/config/prow/cluster/monitoring/README.md
At minimum this issue should provide a way to track prow job resource usage, normal container and node metrics. These metrics could then be used for alerting.