jmcgrath207 / k8s-ephemeral-storage-metrics

Prometheus ephemeral storage metrics exporter
https://jmcgrath207.github.io/k8s-ephemeral-storage-metrics/
MIT License
85 stars 35 forks source link

added source label to ephemeral_storage_container_limit_percentage #97

Closed miminar closed 2 months ago

miminar commented 2 months ago

in order to differentiate between "hard" and "soft" limits

"hard" comes pod.spec.containers.resources.limits["ephemeral-storage"] (if set) and terminates the pod once reached

"soft" comes from the node and may not necessarily cause an eviction of the pod in question

Motivation

To implement an alert rule à la KubePersistentVolumeFillingUp for a container like this:

        - alert: ContainerEphemeralStorageUsageAtLimit
          annotations:
            description: >-
              {{ `Ephemeral storage usage of pod/container {{ $labels.pod_name
              }}/{{ $labels.exported_container }} in Namespace {{
              $labels.pod_namespace }} on Node {{ $labels.node_name }} {{ with
              $labels.cluster -}} on Cluster {{ . }} {{- end }} is at {{ $value
              }}% of the limit.` }}
            summary: Container ephemeral storage usage is at the limit.
          expr: |-2
            ( ( max by (node_name, pod_namespace, pod_name, exported_container)
                       (avg_over_time(ephemeral_storage_container_limit_percentage[5m]))
              > 85.0)
            # ignore soft-limits (containers without an ephemeral-storage limit set
            # will report ephemeral_storage_node_percentage as their limit)
            and on (pod_namespace, pod_name, exported_container)
                   ( max_over_time(
                     ( abs(ephemeral_storage_container_limit_percentage
                     - on (node_name) group_left () ephemeral_storage_node_percentage))[5m:30s])
                   # there can be a slight deviation between the metric
                   # reported for a container and the node
                   > 0.06)
            )
            # ignore pods that haven't been running for some time (e.g. completed jobs)
            unless on (pod_namespace, pod_name)
                      ( (label_replace(label_replace(
                           sum by (namespace, pod)
                                  (max_over_time(kube_pod_status_phase{phase="Running"}[1m])),
                           "pod_namespace", "$1", "namespace", "(.*)"),
                           "pod_name", "$1", "pod", "(.*)"))
                      == 0)
          for: 1m
          labels:
            severity: warning

Notice the middle part of the expression (and on (pod_namespace, pod_name, exported_container) ( max_over_time( ... > 0.06) which makes the whole evaluation fragile, complicated and slow.

It is needed to filter out all containers with pod.spec.containers.requests.limits["ephemeral-storage"] unset.

Without it, there would be too many instances of the alert generated for a single node running out of ephemeral storage (one for each container running on the node). For that case, there should be another alert like NodeOutOfEphemeralStorage

With this change in place, one could simplify the expression to:

            ( max by (node_name, pod_namespace, pod_name, exported_container)
                       (avg_over_time(ephemeral_storage_container_limit_percentage{source="container"}[5m]))
            > 85.0)
            # ignore pods that haven't been running for some time (e.g. completed jobs)
            unless on (pod_namespace, pod_name)
                      ( (label_replace(label_replace(
                           sum by (namespace, pod)
                                  (max_over_time(kube_pod_status_phase{phase="Running"}[1m])),
                           "pod_namespace", "$1", "namespace", "(.*)"),
                           "pod_name", "$1", "pod", "(.*)"))
                      == 0)

I don't have a strong opinion on the new label name/ values. Instead of adding a new label, there could be a whole new metric instead (e.g. ephemeral_storage_container_hard_limit_percentage).

Please let me know what you think.

jmcgrath207 commented 2 months ago

Thanks @miminar. I agree. After reflecting on this, I think it would be best to go with this PR since I have already set the expectations for this metric. For ephemeral_storage_container_{hard,soft}_limit_percentage I like this for V2 but I don't want to break default/expect behavior before then.

I do have slight reservations around high cardinality risk, but we do have relabeling the user can use and at worst, I can add an option flag to disable the source label by default.

Great job! If you want to contribute your PrometheusRule after this I am open to PRs.

Thanks!

jmcgrath207 commented 2 months ago

@miminar This is now release in 1.11.0

miminar commented 2 months ago

Thank you! It helps a lot.