kubernetes / kube-state-metrics

Add-on agent to generate and expose cluster-level metrics.
https://kubernetes.io/docs/concepts/cluster-administration/kube-state-metrics/
Apache License 2.0
5.37k stars 2k forks source link

deleted pods still reporting metrics #1569

Closed jpdstan closed 2 years ago

jpdstan commented 3 years ago

What happened:

it seems that sometimes metrics don't get deleted alongside the pod. It isn't until we churn all the kube-state-metrics pods that it fixes it.

What's even stranger is that it won't be all metrics for that pod that will incorrectly exist; for example, for a particular pod that was deleted, we noticed that it was still reporting kube_pod_container_status_waiting_reason, but not kube_pod_container_resource_requests.

What you expected to happen:

When a pod gets deleted, all metrics associated with that pod should also be deleted.

How to reproduce it (as minimally and precisely as possible):

It's unclear as to how this happens - whenever we try to reproduce by manually deleting a pod and querying for all its metrics ({pod="my_pod"}), it seems to work just fine, i.e. the metrics all disappear.

Anything else we need to know?:

Environment:

fpetkovski commented 3 years ago

This could be related to https://github.com/kubernetes/kube-state-metrics/issues/694

fredr commented 3 years ago

Have you checked via kubectl that the pods in this state are actually deleted, and not in some non running state, such as Completed or Evicted?

jpdstan commented 3 years ago

@fredr Yes, they are definitely deleted.

irl-segfault commented 2 years ago

Same thing happening to me on EKS

jpdstan commented 2 years ago

Seeing another instance of this. These two metrics existed at the same time for the pod named taskmanager-0... the IP addresses differ because one IP is old one and the other IP is current one.

kube_pod_labels{
 host="1.1.147.202"
 instance="1.1.147.202:9102"
 job="kubernetes-pods-k8s-production"
 kubernetes_namespace="kube-system"
 kubernetes_pod_name="kube-state-metrics-4"
 pod="taskmanager-0"
 ...
}

kube_pod_labels{
 host="1.1.188.37"
 instance="1.1.188.37:9102"
 job="kubernetes-pods-k8s-production"
 kubernetes_namespace="kube-system"
 kubernetes_pod_name="kube-state-metrics-8"
 pod="taskmanager-0"
 ...
}
boniek83 commented 2 years ago

Happens to me with kube_pod_container_resource_requests and "Terminated" pods (but not yet removed by terminated pod garbage collector). KSM version: kube-state-metrics/kube-state-metrics:v2.4.1 I would expect that kube_pod_container_resource_requests would not return terminated pods (or at least expect them correctly labelled so I can filter them).

fpetkovski commented 2 years ago

This case is expected since KSM exposes everything from the apiserver. If you are not interested in terminated pods, you can drop the series using relabeling.

k8s-triage-robot commented 2 years ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot commented 2 years ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot commented 2 years ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

k8s-ci-robot commented 2 years ago

@k8s-triage-robot: Closing this issue.

In response to [this](https://github.com/kubernetes/kube-state-metrics/issues/1569#issuecomment-1229403814): >The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. > >This bot triages issues and PRs according to the following rules: >- After 90d of inactivity, `lifecycle/stale` is applied >- After 30d of inactivity since `lifecycle/stale` was applied, `lifecycle/rotten` is applied >- After 30d of inactivity since `lifecycle/rotten` was applied, the issue is closed > >You can: >- Reopen this issue or PR with `/reopen` >- Mark this issue or PR as fresh with `/remove-lifecycle rotten` >- Offer to help out with [Issue Triage][1] > >Please send feedback to sig-contributor-experience at [kubernetes/community](https://github.com/kubernetes/community). > >/close > >[1]: https://www.kubernetes.dev/docs/guide/issue-triage/ Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.