kubernetes / kube-state-metrics

Add-on agent to generate and expose cluster-level metrics.
https://kubernetes.io/docs/concepts/cluster-administration/kube-state-metrics/
Apache License 2.0
5.36k stars 2k forks source link

If a container doesn't have cpu limits, kube_pod_resource_limit reports the init-container limit #2295

Closed andresm53 closed 7 months ago

andresm53 commented 9 months ago

What happened: Given the following pod:

apiVersion: v1
kind: Pod
metadata:
  name: example
  labels:
    app: nginx
spec:
  containers:
    - name: nginx
      image: nginx:latest
      ports:
        - containerPort: 8080
  initContainers:
  - name: init-myservice
    image: busybox:1.28
    resources:
      limits:
        cpu: 500m
        memory: 64Mi
      requests:
        cpu: 50m
        memory: 64Mi

kube_pod_resource_limit reports 500m as the pod limit.

What you expected to happen: As per Init Containers documentation:

The Pod's effective request/limit for a resource is the higher of:

  • the sum of all app containers request/limit for a resource
  • the effective init request/limit for a resource

Since the app container doesn't have cpu limits, which means "no limit", I would have expected that kube_pod_resource_limit reports none.

How to reproduce it (as minimally and precisely as possible):

  1. Create a pod using the example pasted above.
  2. Query kube_pod_resource_limit cpu resource: sum(kube_pod_resource_limit{resource='cpu',pod='example',namespace='test'})

Anything else we need to know?:

Environment: Openshift 4.12.

dashpole commented 8 months ago

/assign @rexagod /triage accepted

rexagod commented 8 months ago

FYI While this seems like a bug, it is recommended to use kube-scheduler's exposed metrics for kube_pod_resource_{limit/request}s.

andresm53 commented 8 months ago

Thanks @rexagod . The problem with that (kube-scheduler's exposed metrics) is: in my particular case, I am using Openshift (4.12) and by default it uses kube_pod_resource_limit to display the cpu metrics chart. This is how it looks like, for the example pod I provided before. As you can see the chart is confusing, because the pod doesn't effectively has any cpu limits, but the chart implies that it has.

293755369-a544a531-6460-4bfe-81c7-29edefd2d926

rexagod commented 7 months ago

@andresm53 Oh wow! Thank you for bringing this up, I'll ping the console folks internally to take a look. That being said, I believe https://github.com/openshift/console would be a better place to raise this.

Closing, feel free to reopen in openshift/console.