Open spiffxp opened 3 years ago
/help
@spiffxp: This request has been marked as needing help from a contributor.
Please ensure the request meets the requirements listed here.
If this request no longer meets these requirements, the label can be removed
by commenting with the /remove-help
command.
I'm with low bandwidth now, but If we have some time (not urgent) I can take a look into this to see how to manage the alerts and dashboards with Gitops :)
/assign
So far:
My thoughts on this specific part: I really like the idea of using crossplane (k8s objects) to manage our cloud env, but I guess a lot of folks are familiar already with Terraform (although I agree with Justin, migration between versions sometimes is...annoying...)
Will create some simple .tf tomorrow with the same approach, trying to create notification channels and alert policies, and seeing how this reflects on stack driver.
@ameukam will work on this, using @thockin tests to monitor certificates renew and expiration as an example.
https://github.com/kubernetes/k8s.io/pull/1877 <- Created a PR with a really simple Terraform that adds an uptime check and the current alert policy.
We can improve this, like adding latency/uptime alerting (like for cs.k8s.io and others), etc.
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale
.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close
.
Send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale
/remove-lifecycle stale
A good first step would be understanding how to export whatever existing alerts we have as part of audit/audit-gcp.sh
https://github.com/GoogleCloudPlatform/oss-test-infra/tree/master/prow/oss/terraform/modules/alerts good prior art to start from
/milestone v1.23 I think it would be really handy to use this at a bare minimum for uptime checks on the apps we run on aaa
/milestone clear
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle stale
/lifecycle rotten
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
/remove-lifecycle stale /lifecycle frozen /milestone clear
Discussed in k8s-infra meeting 2020-02-03
We have some slack alerting setup today, but it's been configured by humans clicking around on the Google Cloud website (aka "click-ops"). It would be ideal if we could drive that configuration automatically via files checked into git (aka "git-ops").
This is likely similar to or overlaps with making a gitops-driven workflow for Google Cloud Monitoring dashboards (https://github.com/kubernetes/k8s.io/issues/1376)
/wg k8s-infra /sig release /area release-eng FYI @kubernetes/release-engineering since #k8s-infra-alerts contains container image promoter alerts /priority important-longterm