afrittoli commented 4 years ago

Expected Behavior

We should monitor the status of the various CI/CD services and we should be able to display metrics about the status of the services using grafana, following the example of https://monitoring.prow.k8s.io/d/8P7-1J8Wz/boskos-server-dashboard?orgId=1 and https://github.com/kubernetes/test-infra/tree/201c7788b244ab2fc3efae7249fb939223ef6e1e/prow/cluster/monitoring

Things that we need to monitor are:

prow and boskos from the prow cluster
tekton services from the dogfooding cluster

We should display metrics from services where available:

boskos resource status (e.g. https://monitoring.prow.k8s.io/d/wSrfvNxWz/boskos-resource-usage?orgId=1)
tekton pipelinerun and taskrun metrics

We'll need prometheus and grafana deployed somewhere. We may be able to use one instance across clusters, at least for grafana. We might want alertmanager too, so we could alert build-cop on slack when something is broken.

Actual Behavior

We don't have any monitoring in place

bobcatfish commented 4 years ago

Great idea!

vdemeester commented 4 years ago

/area test-infra /kind enhancement

tekton-robot commented 4 years ago

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close. If this issue is safe to close now please do so with /close.

/lifecycle stale

Send feedback to tektoncd/plumbing.

tekton-robot commented 4 years ago

Stale issues rot after 30d of inactivity. Mark the issue as fresh with /remove-lifecycle rotten. Rotten issues close after an additional 30d of inactivity. If this issue is safe to close now please do so with /close.

/lifecycle rotten

Send feedback to tektoncd/plumbing.

tekton-robot commented 4 years ago

Rotten issues close after 30d of inactivity. Reopen the issue with /reopen. Mark the issue as fresh with /remove-lifecycle rotten.

/close

Send feedback to tektoncd/plumbing.

tekton-robot commented 4 years ago

@tekton-robot: Closing this issue.

In response to [this](https://github.com/tektoncd/plumbing/issues/235#issuecomment-673448177): >Rotten issues close after 30d of inactivity. >Reopen the issue with `/reopen`. >Mark the issue as fresh with `/remove-lifecycle rotten`. > >/close > >Send feedback to [tektoncd/plumbing](https://github.com/tektoncd/plumbing). Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.

afrittoli commented 4 years ago

/remove-lifecycle rotten

vdemeester commented 4 years ago

/remove-lifecycle rotten /remove-lifecycle stale /reopen

tekton-robot commented 4 years ago

@vdemeester: Reopened this issue.

In response to [this](https://github.com/tektoncd/plumbing/issues/235#issuecomment-673505853): >/remove-lifecycle rotten >/remove-lifecycle stale >/reopen Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.

tekton-robot commented 3 years ago

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close. If this issue is safe to close now please do so with /close.

/lifecycle stale

Send feedback to tektoncd/plumbing.

vdemeester commented 3 years ago

/remove-lifecycle stale

bobcatfish commented 3 years ago

I'm gonna tentatively assign this to myself, since i'm looking into https://github.com/tektoncd/pipeline/issues/540 theoretically ill at least look into setting up some monitoring for performance testing, maybe!

/assign

tekton-robot commented 3 years ago

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale with a justification. Stale issues rot after an additional 30d of inactivity and eventually close. If this issue is safe to close now please do so with /close with a justification. If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/lifecycle stale

Send feedback to tektoncd/plumbing.

tekton-robot commented 3 years ago

Stale issues rot after 30d of inactivity. Mark the issue as fresh with /remove-lifecycle rotten with a justification. Rotten issues close after an additional 30d of inactivity. If this issue is safe to close now please do so with /close with a justification. If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/lifecycle rotten

Send feedback to tektoncd/plumbing.

tekton-robot commented 3 years ago

Rotten issues close after 30d of inactivity. Reopen the issue with /reopen with a justification. Mark the issue as fresh with /remove-lifecycle rotten with a justification. If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/close

Send feedback to tektoncd/plumbing.

tekton-robot commented 3 years ago

@tekton-robot: Closing this issue.

In response to [this](https://github.com/tektoncd/plumbing/issues/235#issuecomment-854116522): >Rotten issues close after 30d of inactivity. >Reopen the issue with `/reopen` with a justification. >Mark the issue as fresh with `/remove-lifecycle rotten` with a justification. >If this issue should be exempted, mark the issue as frozen with `/lifecycle frozen` with a justification. > >/close > >Send feedback to [tektoncd/plumbing](https://github.com/tektoncd/plumbing). Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.

vdemeester commented 1 year ago

/area roadmap

tektoncd / plumbing

Setup monitoring components for infra clusters #235

Expected Behavior

Actual Behavior