google / cadvisor

Analyzes resource usage and performance characteristics of running containers.
Other
16.96k stars 2.31k forks source link

container_cpu_usage_seconds_total should include the pod's uid #3270

Open ntl-ibm opened 1 year ago

ntl-ibm commented 1 year ago

What would you like to be added: Metrics such as container_cpu_usage_seconds_total should include the UID of the pod.

Why is this needed: My pod is part of a stateful set, so it keeps the same name when the pod is deleted and recreated. I would like to calculate CPU usage over a specific time interval.

Consider: sum(increase(container_cpu_usage_seconds_total[])) by (namespace, pod)

Assume that a pod was ended during the window and restarted. I want to get two time series returned. 1) CPU usage for the pod that was ended from the beginning of the time window until the end of the pod 2) CPU usage for the new pod that was started from the time the pod started until the end of the time window.

But because the aggregation is only on namespace and pod, I will get one time series with the aggregate. I need to include the pod's uid (included with the pod metrics), but this is not available.

Describe the solution you'd like The "id" is almost what I need, and I think it has the pod id embedded in the path value. For example: id="/system.slice/containerd.service/kubepods-burstable-pod52d49e6c_9e54_4602_842c_39b18a3b5a29.slice:cri-containerd:0c509dd847a611eb3e86fd58354a883fbfd223361ce38e2d7374aac751a69119"

52d49e6c_9e54_4602_842c_39b18a3b5a29 looks like what I need, and I can extract it after the prometheus query. But it's inconvenient and probably doesn't perform as well.

The pod uid field should be added. Additional context It's possible I'm missing something here, but that's what I found after looking at this for a long time.

juliamatsak commented 1 month ago

Any updates here?