tektoncd / plumbing

This repo holds configuration for infrastructure used across the tektoncd org 🏗️
Apache License 2.0
60 stars 110 forks source link

Setup monitoring components for infra clusters #235

Open afrittoli opened 4 years ago

afrittoli commented 4 years ago

Expected Behavior

We should monitor the status of the various CI/CD services and we should be able to display metrics about the status of the services using grafana, following the example of https://monitoring.prow.k8s.io/d/8P7-1J8Wz/boskos-server-dashboard?orgId=1 and https://github.com/kubernetes/test-infra/tree/201c7788b244ab2fc3efae7249fb939223ef6e1e/prow/cluster/monitoring

Things that we need to monitor are:

We should display metrics from services where available:

We'll need prometheus and grafana deployed somewhere. We may be able to use one instance across clusters, at least for grafana. We might want alertmanager too, so we could alert build-cop on slack when something is broken.

Actual Behavior

We don't have any monitoring in place

bobcatfish commented 4 years ago

Great idea!

vdemeester commented 4 years ago

/area test-infra /kind enhancement

tekton-robot commented 4 years ago

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close. If this issue is safe to close now please do so with /close.

/lifecycle stale

Send feedback to tektoncd/plumbing.

tekton-robot commented 4 years ago

Stale issues rot after 30d of inactivity. Mark the issue as fresh with /remove-lifecycle rotten. Rotten issues close after an additional 30d of inactivity. If this issue is safe to close now please do so with /close.

/lifecycle rotten

Send feedback to tektoncd/plumbing.

tekton-robot commented 4 years ago

Rotten issues close after 30d of inactivity. Reopen the issue with /reopen. Mark the issue as fresh with /remove-lifecycle rotten.

/close

Send feedback to tektoncd/plumbing.

tekton-robot commented 4 years ago

@tekton-robot: Closing this issue.

In response to [this](https://github.com/tektoncd/plumbing/issues/235#issuecomment-673448177): >Rotten issues close after 30d of inactivity. >Reopen the issue with `/reopen`. >Mark the issue as fresh with `/remove-lifecycle rotten`. > >/close > >Send feedback to [tektoncd/plumbing](https://github.com/tektoncd/plumbing). Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
afrittoli commented 4 years ago

/remove-lifecycle rotten

vdemeester commented 4 years ago

/remove-lifecycle rotten /remove-lifecycle stale /reopen

tekton-robot commented 4 years ago

@vdemeester: Reopened this issue.

In response to [this](https://github.com/tektoncd/plumbing/issues/235#issuecomment-673505853): >/remove-lifecycle rotten >/remove-lifecycle stale >/reopen Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
tekton-robot commented 3 years ago

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close. If this issue is safe to close now please do so with /close.

/lifecycle stale

Send feedback to tektoncd/plumbing.

vdemeester commented 3 years ago

/remove-lifecycle stale

bobcatfish commented 3 years ago

I'm gonna tentatively assign this to myself, since i'm looking into https://github.com/tektoncd/pipeline/issues/540 theoretically ill at least look into setting up some monitoring for performance testing, maybe!

/assign

tekton-robot commented 3 years ago

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale with a justification. Stale issues rot after an additional 30d of inactivity and eventually close. If this issue is safe to close now please do so with /close with a justification. If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/lifecycle stale

Send feedback to tektoncd/plumbing.

tekton-robot commented 3 years ago

Stale issues rot after 30d of inactivity. Mark the issue as fresh with /remove-lifecycle rotten with a justification. Rotten issues close after an additional 30d of inactivity. If this issue is safe to close now please do so with /close with a justification. If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/lifecycle rotten

Send feedback to tektoncd/plumbing.

tekton-robot commented 3 years ago

Rotten issues close after 30d of inactivity. Reopen the issue with /reopen with a justification. Mark the issue as fresh with /remove-lifecycle rotten with a justification. If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/close

Send feedback to tektoncd/plumbing.

tekton-robot commented 3 years ago

@tekton-robot: Closing this issue.

In response to [this](https://github.com/tektoncd/plumbing/issues/235#issuecomment-854116522): >Rotten issues close after 30d of inactivity. >Reopen the issue with `/reopen` with a justification. >Mark the issue as fresh with `/remove-lifecycle rotten` with a justification. >If this issue should be exempted, mark the issue as frozen with `/lifecycle frozen` with a justification. > >/close > >Send feedback to [tektoncd/plumbing](https://github.com/tektoncd/plumbing). Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
vdemeester commented 1 year ago

/area roadmap