jmcgrath207 / k8s-ephemeral-storage-metrics

Prometheus ephemeral storage metrics exporter
https://jmcgrath207.github.io/k8s-ephemeral-storage-metrics/
MIT License
85 stars 35 forks source link

Multiple Container bad calculation? #103

Closed Franck31 closed 1 month ago

Franck31 commented 1 month ago

Hi,

I believe I've found an issue with our large cluster using a service mesh. In our setup, each pod contains four or more containers with the following ephemeral storage configurations:

Envy:
      limits:
        ephemeral-storage: 120Mi
      requests:
        ephemeral-storage: 100Mi
OpenTelemetry:
      limits:
        ephemeral-storage: 50Mi
      requests:
        ephemeral-storage: 50Mi
InjectorContainer:
      limits:
        ephemeral-storage: 300Mi
      requests:
        ephemeral-storage: 300Mi
ApplicationContainer:
      limits:
        ephemeral-storage: 5Mi
      requests:
        ephemeral-storage: 5Mi

Example:

If the ApplicationContainer is using 1MB of its 5MB limit, the metric should ideally show:

ephemeral_storage_container_limit_percentage{instance="10.1.1.1:9100", job="k8s-ephemeral-storage-metrics", node_name="ip-10-1-1-1.ec2.internal", pod_name="panchito-b5c46aed61caf53d3987832b-66b5b5b46d-29ntl", pod_namespace="sandbox-panchito",container="ApplicationContainer"} = 20

However, because the exporter is using the 300MB limit from InjectorContainer and InjectorContainer is using 150MB, the metric inaccurately shows:

ephemeral_storage_container_limit_percentage{instance="10.1.1.1:9100", job="k8s-ephemeral-storage-metrics", node_name="ip-10-1-1-1.ec2.internal", pod_name="panchito-b5c46aed61caf53d3987832b-66b5b5b46d-29ntl", pod_namespace="sandbox-panchito", container="ApplicationContainer"} = 3000

This makes it impossible for us to use this metric as an alert to determine if a container is at risk of eviction.

Can this issue be addressed to correctly reflect the storage usage per container?

jmcgrath207 commented 1 month ago

Thanks for raising this issue @Franck31 .

Could you tell me what version of the helm chart and kubernetes you are on?

Also, have you seen this issue with all containers or only in this pod?

Thanks

Franck31 commented 1 month ago

all containers, I think is easy to reproduce.

I'm using the 1.10.1 helm chart version and v1.26.15-eks-ae9a62a in k8s.

Cheers!

jmcgrath207 commented 1 month ago

@Franck31 Can you give helm chart version 1.11.2-rc01 a try?

I believe I fixed the problem, but I would like your feedback since you have a bigger environment to test on.

Thanks!

jmcgrath207 commented 1 month ago

Hey @Franck31 , did you have a chance to test out 1.11-rc01

Franck31 commented 1 month ago

Hi @jmcgrath207,

I just tried your fix, but the problem persists. I think it has to do with the fact that the k8s summary API only returns the total pod usage, not usage divided by container (/api/v1/nodes//proxy/stats/summary).

So, I believe it’s not possible to determine the amount of space used by each container.

By the way, I'm also facing another issue. My k8s cluster has around 400 nodes, and the k8s API is being throttled. I have the polling interval set to 30 seconds. When throttling occurs, what happens to the value of each container? Does the exporter show the old value or does it disappear?

Cheers!

jmcgrath207 commented 1 month ago

Hey @Franck31

I finally got to do a deeper dive on this. Based on these results I feel the volume mount metric is working, but let me know if see any issues in my testing.

https://github.com/jmcgrath207/k8s-ephemeral-storage-metrics/issues/106#issuecomment-2264830432

and the k8s API is being throttled.

Unfortunately I do have access to a cluster this big, but you may find kubelet.scrape=True useful.

Values: https://github.com/jmcgrath207/k8s-ephemeral-storage-metrics/blob/master/chart/values.yaml#L22

Code https://github.com/jmcgrath207/k8s-ephemeral-storage-metrics/pull/90

Just a heads up, I haven't tested this yet since I was only using minikube docker for e2e. minikube virtualbox was merged in this release. So I do plan to add coverage to this in the near future.

Or just set your polling to something higher like 60. Cdavisor does have a delay in its reporting.