kubernetes / kube-state-metrics

Add-on agent to generate and expose cluster-level metrics.
https://kubernetes.io/docs/concepts/cluster-administration/kube-state-metrics/
Apache License 2.0
5.42k stars 2.02k forks source link

Metric kube_pod_status_reason returns hundreds of results, but all with a zero value #2116

Closed jgagnon44 closed 2 months ago

jgagnon44 commented 1 year ago

What happened: Running a PromQL query on the kube_pod_status_reason metric returns hundreds of result records, but every one has a value of zero. This occurs even with pods in failed or pending status.

What you expected to happen: I would expect for cases where any pods in failed or pending status to have a result from this metric with a non-zero value.

How to reproduce it (as minimally and precisely as possible): Run a query in Prometheus on kube_pod_status_reason. No filtering is needed. I get hundreds of result entries, every one with a value of zero.

If you run the query slightly differently: kube_pod_status_reason > 0 you will get an empty response, even with pods in failed or pending status.

Anything else we need to know?:

Environment:

Not sure how to find this.

I don't know if this is equivalent. I inspected the pod and found the following:

  containers:
    - name: kube-state-metrics
      image: k8s.gcr.io/kube-state-metrics/kube-state-metrics:v2.0.0
CatherineF-dev commented 1 year ago

Could you curl kube-state-metrics endpoint to get full metrics payload?

For example,

# change kube-system to correct namespace where kube-state-metrics exists
kubectl -n kube-system port-forward pod/kube-state-metrics-xxx 9443:9443

curl  http://localhost:9443/metrics | grep kube_pod_status_reason
jgagnon44 commented 1 year ago

Just discovered that the kube-state-metrics pod is running in the lens-metrics namespace. Not sure what that means. Was expecting to see it in the namespace you mentioned.

In any case, tried to do as you suggested, but ran into a problem. I ran the port-forward command and then in a separate terminal entered the curl command. The curl command returns an empty reply and when I switched back to the other terminal, the port-forward had exited with an error.

$ k -n lens-metrics port-forward kube-state-metrics-775bd8fd9f-r5qgv 9443:9443
Forwarding from 127.0.0.1:9443 -> 9443
Forwarding from [::1]:9443 -> 9443
Handling connection for 9443
E0712 06:53:14.074707   38060 portforward.go:406] an error occurred forwarding 9443 -> 9443: error forwarding port 9443 to pod 45c511ca7caaa67f29593a81f9f974e8f5bcbc078d66216fd14ff9df01de5bf2, uid : port forward into network namespace "/var/run/netns/3ba1b4c7-1e76-4115-8d89-e98f8c3e3ec3": failed to connect to localhost:9443 inside namespace 45c511ca7caaa67f29593a81f9f974e8f5bcbc078d66216fd14ff9df01de5bf2: dial tcp 127.0.0.1:9443: connect: connection refused
E0712 06:53:14.076274   38060 portforward.go:234] lost connection to pod

I tried something else. In Lens, I accessed the pod and forwarded the port from there. It allowed me to open in a browser, whereupon I was able to access a metrics link. Clicking on this presented me with a page with tons of information. I did a search for the metric and found over 250 entries, each with a value zero.

Here's a sample:

kube_pod_status_reason{namespace="prometheus",pod="prometheus-prometheus-kube-prometheus-prometheus-0",reason="NodeLost"} 0
kube_pod_status_reason{namespace="prometheus",pod="prometheus-prometheus-kube-prometheus-prometheus-0",reason="Evicted"} 0
kube_pod_status_reason{namespace="prometheus",pod="prometheus-prometheus-kube-prometheus-prometheus-0",reason="UnexpectedAdmissionError"} 0
kube_pod_status_reason{namespace="deploy",pod="clean-deploy-cronjob-28146870-w2nbx",reason="NodeLost"} 0
kube_pod_status_reason{namespace="deploy",pod="clean-deploy-cronjob-28146870-w2nbx",reason="Evicted"} 0
kube_pod_status_reason{namespace="deploy",pod="clean-deploy-cronjob-28146870-w2nbx",reason="UnexpectedAdmissionError"} 0
kube_pod_status_reason{namespace="kubeapps",pod="kubeapps-7db5f76cb5-qt2jq",reason="NodeLost"} 0
kube_pod_status_reason{namespace="kubeapps",pod="kubeapps-7db5f76cb5-qt2jq",reason="Evicted"} 0
kube_pod_status_reason{namespace="kubeapps",pod="kubeapps-7db5f76cb5-qt2jq",reason="UnexpectedAdmissionError"} 0
kube_pod_status_reason{namespace="deploy",pod="clean-deploy-cronjob-28148310-sk282",reason="NodeLost"} 0
kube_pod_status_reason{namespace="deploy",pod="clean-deploy-cronjob-28148310-sk282",reason="Evicted"} 0
kube_pod_status_reason{namespace="deploy",pod="clean-deploy-cronjob-28148310-sk282",reason="UnexpectedAdmissionError"} 0
CatherineF-dev commented 1 year ago

Could you list pods which are in failed or pending status? Also list the reasons.

It might be because current implementations only include these three reasons {NodeLost, Evicted, UnexpectedAdmissionError}.

jgagnon44 commented 1 year ago

This is interesting. If I query this metric in Prometheus, I get 5 result entries per pod. If I port forward and access the kube-state-metrics /metrics, I'm only getting 3 result entries per pod. Here's an example:

From kube-state-metrics:

kube_pod_status_reason{namespace="ingress-nginx",pod="ingress-nginx-controller-f5rmk",reason="NodeLost"} 0
kube_pod_status_reason{namespace="ingress-nginx",pod="ingress-nginx-controller-f5rmk",reason="Evicted"} 0
kube_pod_status_reason{namespace="ingress-nginx",pod="ingress-nginx-controller-f5rmk",reason="UnexpectedAdmissionError"} 0

From Prometheus:

kube_pod_status_reason{container="kube-state-metrics", endpoint="http", instance="10.244.211.138:8080", job="kube-state-metrics", namespace="ingress-nginx", pod="ingress-nginx-controller-f5rmk", reason="Evicted", service="prometheus-kube-state-metrics", uid="29cd7cc2-86d4-4a9d-854e-1e122af3403b"} | 0
kube_pod_status_reason{container="kube-state-metrics", endpoint="http", instance="10.244.211.138:8080", job="kube-state-metrics", namespace="ingress-nginx", pod="ingress-nginx-controller-f5rmk", reason="NodeAffinity", service="prometheus-kube-state-metrics", uid="29cd7cc2-86d4-4a9d-854e-1e122af3403b"} | 0
kube_pod_status_reason{container="kube-state-metrics", endpoint="http", instance="10.244.211.138:8080", job="kube-state-metrics", namespace="ingress-nginx", pod="ingress-nginx-controller-f5rmk", reason="NodeLost", service="prometheus-kube-state-metrics", uid="29cd7cc2-86d4-4a9d-854e-1e122af3403b"} | 0
kube_pod_status_reason{container="kube-state-metrics", endpoint="http", instance="10.244.211.138:8080", job="kube-state-metrics", namespace="ingress-nginx", pod="ingress-nginx-controller-f5rmk", reason="Shutdown", service="prometheus-kube-state-metrics", uid="29cd7cc2-86d4-4a9d-854e-1e122af3403b"} | 0
kube_pod_status_reason{container="kube-state-metrics", endpoint="http", instance="10.244.211.138:8080", job="kube-state-metrics", namespace="ingress-nginx", pod="ingress-nginx-controller-f5rmk", reason="UnexpectedAdmissionError", service="prometheus-kube-state-metrics", uid="29cd7cc2-86d4-4a9d-854e-1e122af3403b"} | 0

Why would I get a different set of results?

CatherineF-dev commented 1 year ago

fyi: Pod metrics are implemented in this file https://github.com/kubernetes/kube-state-metrics/blob/main/internal/store/pod.go

You can have a look at this file to figure out what happened.

I am curious on whether kube_pod_status_reason didn't reports failed pods.

jgagnon44 commented 1 year ago

Here are the current failed or pending pods on the cluster:

Pod Namespace Status Reason
runner-goew1uzh-project-8-concurrent-0zk79h runner-workspace-writer Failed ContainerStatusUnknown - exit code: 137
runner-goew1uzh-project-8-concurrent-1fhtb4 runner-workspace-writer Failed ContainerStatusUnknown - exit code: 137
clean-deploy-cronjob-28151190-fqqgs deploy Pending Back-off pulling image "harbor.hulk.beast-code.com/phactory-images/cleanup-deploy:master"
clean-deploy-cronjob-28152630-jzrsj deploy Pending Back-off pulling image "harbor.hulk.beast-code.com/phactory-images/cleanup-deploy:master"
CatherineF-dev commented 1 year ago

I found the reason. It's because using bounded reasons.

podStatusReasons           = []string{"Evicted", "NodeAffinity", "NodeLost", "Shutdown", "UnexpectedAdmissionError"}

https://github.com/kubernetes/kube-state-metrics/blob/main/internal/store/pod.go#L40

jgagnon44 commented 1 year ago

Is it possible to reconfigure this to be non constrained?

CatherineF-dev commented 1 year ago

Could you try updating https://github.com/kubernetes/kube-state-metrics/blob/main/internal/store/pod.go and rebuilding a new kube-state-metrics?

Wondering what's the reason for Back-off pulling image "harbor.hulk.beast-code.com/phactory-images/cleanup-deploy:master"

CatherineF-dev commented 1 year ago

Or could you run kubectl -n deploy get pod clean-deploy-cronjob-28152630-jzrsj -o yaml

jgagnon44 commented 1 year ago

I'm guessing this is unrelated. In any case:

  containerStatuses:
  - image: harbor.hulk.beast-code.com/phactory-images/cleanup-deploy:master
    imageID: ""
    lastState: {}
    name: clean-deploy
    ready: false
    restartCount: 0
    started: false
    state:
      waiting:
        message: Back-off pulling image "harbor.hulk.beast-code.com/phactory-images/cleanup-deploy:master"
        reason: ImagePullBackOff
jgagnon44 commented 1 year ago

Not sure how to attempt what you suggest RE updating the .go file and rebuilding. Also, I don't know Go and am not sure what can be done to make it do a wildcard loop where the variable is used:

https://github.com/kubernetes/kube-state-metrics/blob/bb6e9f42f8bac32ed6e50b6932cb2ab7fc9307ef/internal/store/pod.go#L1475

I guess a more important question is: why is it being constrained in the first place?

CatherineF-dev commented 1 year ago

I guess a more important question is: why is it being constrained in the first place?

Not sure all the reasons. One reason should be constraining metrics cardinality.

I think we can add ImagePullBackOff into the list in master branch. Could you list reason forrunner-goew1uzh-project-8-concurrent-0zk79h as well?

kubectl -n runner-workspace-writer get pod runner-goew1uzh-project-8-concurrent-0zk79h -o yaml
jgagnon44 commented 1 year ago

Both containers in that pod show:

    state:
      terminated:
        exitCode: 137
        finishedAt: null
        message: The container could not be located when the pod was terminated
        reason: ContainerStatusUnknown
        startedAt: null

The same for the other failed pod as well.

jgagnon44 commented 1 year ago

I'm curious about the fact that the kube-state-metrics pod is deployed in the lens-metrics namespace instead of kube-system. This is the only instance of a pod with this name that I can see. Is kube-state-metrics normally deployed when a K8s cluster is set up? I wonder how this cluster got into this configuration.

CatherineF-dev commented 1 year ago

I'm curious about the fact that the kube-state-metrics pod is deployed in the lens-metrics namespace instead of kube-system.

It's fine that kube-state-metrics pod is deployed in the lens-metrics namespace.

Is kube-state-metrics normally deployed when a K8s cluster is set up?

k8s cluster can run without kube-state-metrics. kube-state-metrics is a monitoring addon for k8s cluster.

For example:

kube-state-metrics isn't bundled in k8s cluster (https://github.com/kubernetes/kubernetes). It needs to be installed after cluster is created.

CatherineF-dev commented 1 year ago

Could you see these pods in metric kube_pod_status_phase?

CatherineF-dev commented 1 year ago

kube_pod_status_reason only covers limited reasons.

kube_pod_status_phase should cover all pods.

jgagnon44 commented 1 year ago

Could you see these pods in metric kube_pod_status_phase?

Yes. runner-goew1uzh-project-8-concurrent-0zk79h and runner-goew1uzh-project-8-concurrent-1fhtb4 show phase=Failed clean-deploy-cronjob-28151190-fqqgs and clean-deploy-cronjob-28152630-jzrsj show phase=Pending

jgagnon44 commented 1 year ago

kube_pod_status_phase is only showing five statuses: Failed, Pending, Running, Succeeded and Unknown, with a value of 1 for each pod that is in a particular status.

E.g.:

Query: kube_pod_status_phase{pod="runner-goew1uzh-project-8-concurrent-1fhtb4"}

Result:

Metric Value
kube_pod_status_phase{container="kube-state-metrics", endpoint="http", instance="10.244.211.138:8080", job="kube-state-metrics", namespace="runner-workspace-writer", phase="Failed", pod="runner-goew1uzh-project-8-concurrent-1fhtb4", service="prometheus-kube-state-metrics", uid="0dd41342-cdb8-4d33-b1ff-014cb1bab97b"} 1
kube_pod_status_phase{container="kube-state-metrics", endpoint="http", instance="10.244.211.138:8080", job="kube-state-metrics", namespace="runner-workspace-writer", phase="Pending", pod="runner-goew1uzh-project-8-concurrent-1fhtb4", service="prometheus-kube-state-metrics", uid="0dd41342-cdb8-4d33-b1ff-014cb1bab97b"} 0
kube_pod_status_phase{container="kube-state-metrics", endpoint="http", instance="10.244.211.138:8080", job="kube-state-metrics", namespace="runner-workspace-writer", phase="Running", pod="runner-goew1uzh-project-8-concurrent-1fhtb4", service="prometheus-kube-state-metrics", uid="0dd41342-cdb8-4d33-b1ff-014cb1bab97b"} 0
kube_pod_status_phase{container="kube-state-metrics", endpoint="http", instance="10.244.211.138:8080", job="kube-state-metrics", namespace="runner-workspace-writer", phase="Succeeded", pod="runner-goew1uzh-project-8-concurrent-1fhtb4", service="prometheus-kube-state-metrics", uid="0dd41342-cdb8-4d33-b1ff-014cb1bab97b"} 0
kube_pod_status_phase{container="kube-state-metrics", endpoint="http", instance="10.244.211.138:8080", job="kube-state-metrics", namespace="runner-workspace-writer", phase="Unknown", pod="runner-goew1uzh-project-8-concurrent-1fhtb4", service="prometheus-kube-state-metrics", uid="0dd41342-cdb8-4d33-b1ff-014cb1bab97b"} 0
jgagnon44 commented 1 year ago

kube_pod_status_reason only covers limited reasons.

Seems to be very limited.

jgagnon44 commented 1 year ago

I have found that there is an instance of Prometheus running in the lens-metrics namespace (along with kube-state-metrics and several node-exporter pods). Separately, I have the kube-prometheus-stack (which includes Prometheus, Grafana and other apps) deployed in the prometheus namespace - this is the instance I am using for my work. I wonder what implications (if any) there might be with two Prometheus instances running. My Prometheus deployment has its own set of kube-state-metrics and node-exporter pods. Perhaps the pods in the lens-metrics namespace is only being used by Lens, while everything in the prometheus namespace is what I'm seeing?

CatherineF-dev commented 1 year ago

Hi @dgrisonnet, do you remember reasons why we expose metrics with value = 0 in https://github.com/kubernetes/kube-state-metrics/issues/2116#issuecomment-1632678128

logicalhan commented 1 year ago

/triage accepted /assign @CatherineF-dev

logicalhan commented 1 year ago

I would make this metric opt-in only, and use kube_pod_aggregated_status_reason for lower cardinality.

jgagnon44 commented 1 year ago

I would make this metric opt-in only, and use kube_pod_aggregated_status_reason for lower cardinality.

I get an empty result when I attempt to query this. There appears to be no metric by that name.

dgrisonnet commented 1 year ago

Hi @dgrisonnet, do you remember reasons why we expose metrics with value = 0 in https://github.com/kubernetes/kube-state-metrics/issues/2116#issuecomment-1632678128

From the code, a status with value 0 means that the pod isn't in this state whilst 1 means it is: https://github.com/kubernetes/kube-state-metrics/blob/main/internal/store/pod.go#L1475-L1485.

This allow two types of queries:

  1. Give me all the pod with that particular status
  2. Give me all the pods that don't have this status
galrose commented 1 year ago

kube_pod_status_reason only covers limited reasons.

kube_pod_status_phase should cover all pods.

Hey the reasons indeed do seem very limited, is there a reason it's so limited?

dgrisonnet commented 1 year ago

It should include most reasons, if not we should add some more.

The reason why we keep a finite list of reasons is to avoid having an unbounded label in the metrics which could cause cardinality explosion issues in your monitoring backend.

galrose commented 1 year ago

Is it possible to add more reasons such as ContainerStatusUnknown, CreateContainerConfigError, and Error?

dgrisonnet commented 1 year ago

I believe these are container statuses so they are not fit to be part of kube_pod_status_reason.

That said I would be fine with creating a new metric called kube_pod_container_status and adding the reasons there.

jhoos commented 1 year ago

Could there maybe be a "Other" reason for anything that doesn't match one of the known reasons, so that there's always at least some kube_pod_status_reason for a given pod that returns 1 at any given moment (at least if the pod's status is Failed/Pending)? This would prevent PromQL queries that are doing group_right against kube_pod_status_reason from randomly dropping out if you're trying to alert against it.

dgrisonnet commented 1 year ago

Assuming you have a promQL query along those lines:

kube_pod_status_phase{phase=~"Failed|Pending"} group_left on(reason) (kube_pod_status_reason > 0)

You can keep the result for which kube_pod_status_reason > 0 is not true by performing an outer join:

kube_pod_status_phase{phase=~"Failed|Pending"} group_left on(reason) (kube_pod_status_reason > 0) or kube_pod_status_phase{phase=~"Failed|Pending"} 

Let me know if this solves your issue

Note that it should work the same way with group_right, but I am too used to do it from the left.

k8s-triage-robot commented 2 months ago

This issue has not been updated in over 1 year, and should be re-triaged.

You can:

For more details on the triage process, see https://www.kubernetes.dev/docs/guide/issue-triage/

/remove-triage accepted

dashpole commented 2 months ago

Closing, as we figured out why it was happening.