Open bbdouglas opened 1 year ago
/triage accepted /assign @dgrisonnet
The container level metric should already be available: https://github.com/kubernetes/kube-state-metrics/blob/02417fbc99f3adec84834fc59d5f89cf676ce006/internal/store/pod.go#L1342
Hi @dgrisonnet, thanks for looking into this.
Unfortunately, I believe the metric you pointed to is actually at the pod level, representing the time that all containers are ready (ContainersReady). From the comments in the api:
// ContainersReady indicates whether all containers in the pod are ready.
Correct, the name got me.
We should probably base kube_pod_status_container_ready_time
on ContainerStatus rather than on the pod status.
It is possible to use the existing boolean kube_pod_container_status_ready boolean to calculate this by looking at a series of data points and choosing the first point in time when that flag flips from false to true
@bbdouglas I am curious how you currently calculate this with promQL?
@abhiraut Here is the query I came up with. Since it's looking back, you have to manually set the maximum age that you expect a pod to be up. Here I have assumed no pod lives for more than 1 day.
min_over_time(timestamp(kube_pod_container_status_ready{container="mycontainer", pod_phase="Running"} == 1)[1d])
thanks ! @dgrisonnet do you think we can directly emit the ready time? i think it would be helpful and consistent with how the readiness is emitted at Pod level.
What would you like to be added:
It would be great to have a metric for the container ready time in seconds to be emitted directly. There is currently a boolean gauge
kube_pod_container_status_ready
, which emits whether the container is ready or not, but that requires some computation to get at the time when the container flipped to the ready state. I'm interested in learning the amount of time it took between when the container started and when it was ready, and that would be simpler and more efficient to measure ifkube-state-metrics
emitted the ready time directly.There was a similar metric added at the pod level (#1465), but this would be at the container level. In the pods that I am tracking, there are many containers with wildly varying ready times, so it is helpful for debugging and optimization purposes to know how long each container takes to get ready.
Why is this needed:
Similar to the pod-level ready time metric (#1465), I'd like to measure the ready time of each individual container within my pod. This is helpful for tracking startup-times at a finer level of granularity than the whole pod, especially when a pod has many containers.
It is possible to use the existing boolean
kube_pod_container_status_ready
boolean to calculate this by looking at a series of data points and choosing the first point in time when that flag flips from false to true, but in practice that can be very resource intensive for Prometheus to calculate if there are a large number of pods/containers.Describe the solution you'd like
I would ideally like to see a new metric analogous to
kube_pod_status_ready_time
emitted at the container granularity.Additional context
I'm not that familiar with the internals of the Kubernetes API, but unfortunately it does not look like ContainerStatus has the same breadth of information as PodCondition, which includes a LastTransitionTime. So this might not be a simple addition.