kubernetes / k8s.io

Code and configuration to manage Kubernetes project infrastructure, including various *.k8s.io sites
https://git.k8s.io/community/sig-k8s-infra
Apache License 2.0
732 stars 810 forks source link

[Umbrella issue] How we monitor k8s-infra ? #2588

Open ameukam opened 3 years ago

ameukam commented 3 years ago

We initially had this conversation in https://github.com/kubernetes/k8s.io/issues/401.

Also https://github.com/kubernetes/test-infra/pull/23317#issuecomment-902639443:

FYI @ameukam we don't have this feature enabled in kubernetes.io at the moment but will want to take a look at it soon

Some questions from thockin:

Cluster monitoring a) What should we use? GKE Workload metrics : https://cloud.google.com/stackdriver/docs/solutions/gke/managing-metrics#workload-metrics Managed service for Prometheus : https://cloud.google.com/stackdriver/docs/managed-prometheus b) How do we set it up with git-ops?

App monitoring a) Same tool as cluster monitoring? b) What is the minimum expectation for an app to be deployed into community space c) How do we manage groups of alerts for each app (ggroups?) d) How do we manage on-call for each app?

GCP quotas monitoring How do we monitoring them ?

More questions can be added.

/milestone v1.23 /are infra

ameukam commented 3 years ago

For this milestone, I would like to focus on how to flew out methods and practices about how we should do monitoring for k8s-infra.

/area infra

spiffxp commented 3 years ago

/priority important-longterm

jimdaga commented 3 years ago

If Prometheus is the tool picked I'm happy to jump in and help, I have a decent amount of experience.

ameukam commented 2 years ago

/milestone v1.24

k8s-triage-robot commented 2 years ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

ameukam commented 2 years ago

/remove-lifecycle stale

ameukam commented 2 years ago

/milestone clear

k8s-triage-robot commented 2 years ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

ameukam commented 2 years ago

/remove-lifecycle stale

k8s-triage-robot commented 1 year ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

riaankleinhans commented 1 year ago

/remove-lifecycle stale

k8s-triage-robot commented 1 year ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot commented 1 year ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

riaankleinhans commented 1 year ago

/remove-lifecycle rotten

k8s-triage-robot commented 1 year ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

riaankleinhans commented 1 year ago

/remove-lifecycle stale

ameukam commented 1 year ago

/lifecycle frozen