kubernetes / k8s.io

Code and configuration to manage Kubernetes project infrastructure, including various *.k8s.io sites
https://git.k8s.io/community/sig-k8s-infra
Apache License 2.0
736 stars 824 forks source link

Setup a gitops-driven workflow for google cloud monitoring dashboards #1376

Closed spiffxp closed 8 months ago

spiffxp commented 4 years ago

The use case here is ensuring we don't lose dashboards. By backing them up into git, we can keep track of when/why they have changed, and restore them if they're deleted from google cloud monitoring.

I really wish the UI provided import/export options to make this easier. That said, it is possible via the API or gcloud: https://cloud.google.com/blog/products/management-tools/cloud-monitoring-dashboards-using-an-api

One workflow could be:

This workflow could also allow:

An alternative workflow is:

And finally, it may be possible to automate away the etag toil iff we are smart/consistent about when to avoid stomping overtop of unexpected changes.

/priority important-longterm /wg k8s-infra /sig testing since the dashboards I have in mind are for k8s-infra-prow-build

Specifically, I have these two dashboards in mind (must be member of k8s-infra-prow-viewers@kubernetes.io to view, feel free to PR yourself in if you would like)

spiffxp commented 4 years ago

FYI @kubernetes/ci-signal if anyone is interested in this

/help I am willing to give someone appropriate credentials to develop this workflow, answer questions and review PR's

/area infra/monitoring

fejta-bot commented 3 years ago

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale

spiffxp commented 3 years ago

/remove-lifecycle stale /lifecycle frozen Keeping this around as a good help-wanted issue, unless/until we decide on some other easily "git-ops-able" dashboard solution

spiffxp commented 3 years ago

The UI makes import/export much easier these days, and we now export dashboards as part of audit

So we still can't immediately recreate dashboards at the run of a script for DR purposes, but we're better off than we used to be

spiffxp commented 3 years ago

Could also try using terraform, as is being done here: https://github.com/GoogleCloudPlatform/oss-test-infra/tree/master/prow/oss/terraform

spiffxp commented 3 years ago

There is an intent to eventually move away from Grafana for monitoring.prow.k8s.io if it continues to run on google.com infra for too much longer, due to a license change. This would be a good opportunity to prototype

spiffxp commented 3 years ago

/milestone v1.23

ameukam commented 3 years ago

There is an intent to eventually move away from Grafana for monitoring.prow.k8s.io if it continues to run on google.com infra for too much longer, due to a license change. This would be a good opportunity to prototype

Is the license change is about Grafana's switch to APLv3 ?

rajibmitra commented 3 years ago

I am interested to work on this @spiffxp /assign

spiffxp commented 3 years ago

/remove-sig testing

spiffxp commented 3 years ago

/remove-help /assign At this point the only dashboards I'm aware of are in k8s-infra-prow-build, and I've got a PR out to update those via terraform now: https://github.com/kubernetes/k8s.io/pull/2938

ameukam commented 2 years ago

/milestone clear

ameukam commented 8 months ago

We are not doing this anymore. We have public monitoring dashboards for our build environments:

ameukam commented 8 months ago

/close

k8s-ci-robot commented 8 months ago

@ameukam: Closing this issue.

In response to [this](https://github.com/kubernetes/k8s.io/issues/1376#issuecomment-1975175802): >/close Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.