Open jzeng4 opened 2 years ago
We enabled the soft quota for cpu. Not sure if this matters?
I found the reasons why missing the above metrics:
If "cpu.cfs_quota_us" is "-1" (or say cpu soft quota is enabled), the cadvisor source code doesn't read the value. So "spec.Cpu.Quota" is zero in this case.
For the other missing metrics (container_cpu_cfs_periods_total, container_cpu_cfs_throttled_periods_total, container_cpu_cfs_throttled_seconds_total), since "spec.Cpu.Quota" is zero, those metrics are skipped, e.g. https://github.com/google/cadvisor/blob/master/metrics/prometheus.go#L193
We are hitting the same problem. For reference I've logged an issue with Azure as AKS is our cluster stack - https://github.com/Azure/AKS/issues/3216 but the issue probably falls under this domain.
I found the reasons why missing the above metrics:
If "cpu.cfs_quota_us" is "-1" (or say cpu soft quota is enabled), the cadvisor source code doesn't read the value. So "spec.Cpu.Quota" is zero in this case.
For the other missing metrics (container_cpu_cfs_periods_total, container_cpu_cfs_throttled_periods_total, container_cpu_cfs_throttled_seconds_total), since "spec.Cpu.Quota" is zero, those metrics are skipped, e.g. https://github.com/google/cadvisor/blob/master/metrics/prometheus.go#L193
so, how do you resolve?
Any update on this? Thanks
@davidg-datascene i use kube_pod_container_resource_requests{resource="cpu"} replace container_spec_cpu_quota
@lusson-luo - Hi, thanks, but still need the throttled metrics i.e container_cpu_cfs_throttled_periods_total and container_cpu_cfs_throttled_seconds_total.
sorry, this two metrics i don't know
@davidg-datascene i use kube_pod_container_resource_requests{resource="cpu"} replace container_spec_cpu_quota
i write error, use kube_pod_container_resource_limits{resource="cpu"} to replace container_spec_cpu_quota
For us the metric becomes empty (or not measured) when the container does not have a CPU limit. Does that make sense?
For us the metric becomes empty (or not measured) when the container does not have a CPU limit. Does that make sense?
I thought this at first but the deployment spec contains limits for CPU and tested changes to Mi values as well but made no difference.
spec of deployment:
resources:
limits:
cpu: "3"
memory: 3Gi
requests:
cpu: "1"
memory: 3Gi
The cluster is getting upgraded to AKS 1.24 in next few weeks and all the VM worker nodes will need replacing with a newer image/version. I'll check again when nodes are replaced to see if issue persists.
I've found where the issue lies in our case, at least. In our terraform that defines the Azure AKS node pools we set a _kubeletconfig. For every node pool where this has been set we are missing the CFS quota metrics. If I log into one of the nodes and check the kubelet config, I can see the value as false:
terraform yaml config for kubelet:
kubelet_config:
container_log_max_size_mb: 100
cat ./etc/default/kubeletconfig.json | grep CFS
"cpuCFSQuota": false,
cat ./etc/default/kubeletconfig.json | grep -i log
"containerLogMaxSize": "100M", <-- we are only changing this value but cpuCFSQuota is false, hence no CFS stats
For where we are not setting a kubelet_config there is no kubeletconfig.json file so accepts the defaults as per the Azure configuration - https://learn.microsoft.com/en-us/azure/aks/custom-node-configuration meaning CFS quota is true and you get the stats. The fix but not yet tested is we should be able to set https://registry.terraform.io/providers/hashicorp/azurerm/latest/docs/resources/kubernetes_cluster_node_pool [cpu_cfs_quota_enabled] in the terraform code to enable the CFS stats.
Went deep on this, and I was banging my head why container_cpu_cfs_throttled_seconds_total
is the only one missing while the others are there.
Turns out the Helm chart for kube-prometheus-stack
drops that metric by default.
Just something to check for folks arriving from Google.
@ervinb same for me! Lesson learned: Always check on the most fundamental level, which in this case was "are the limits actually set?"
Good idea to leave that information here.
If the cadvisor-metrics for limits are still not reliable enough, there are a few alternative ways that you can check to get the limits as a prometheus-metric:
If the limits are not set on the pod, then cpu.cfs_quota_us
is set to -1
. This is because there isn't anything to read related to quota or throttling if the quota itself is disabled. This is the reason cadvisor first checks if the cpu.cfs_quota_us
is set to -1
and then doesn't set spec.Cpu.Quota
. As a results of not setting spec.Cpu.Quota
the metrics in questions here (such as container_cpu_cfs_periods_total) aren't reported by cadvisor because when the limits aren't set anything related to quota or throttling has no meaning.
IMHO, cadvisor is behaving as expected and this maybe not be a bug.
/cc @bobbypage WDYT?
If the limits are not set on the pod, then
cpu.cfs_quota_us
is set to-1
. This is because there isn't anything to read related to quota or throttling if the quota itself is disabled. This is the reason cadvisor first checks if thecpu.cfs_quota_us
is set to-1
and then doesn't setspec.Cpu.Quota
. As a results of not settingspec.Cpu.Quota
the metrics in questions here (such as container_cpu_cfs_periods_total) aren't reported by cadvisor because when the limits aren't set anything related to quota or throttling has no meaning.IMHO, cadvisor is behaving as expected and this maybe not be a bug.
This seems odd to me. Maybe I am missing something. Imagine the following scenario:
What will happen here is that Pod A will get throttled massively. I would expect container_cpu_cfs_throttled_seconds_total to go as well. This would happen due to CPU starvation at the node level not due to limits. I guess this would happen if I set a limit of 1 CPU to pod A.
Am I missing something? If that case above is not covered by cadvisor how could I monitor burstabled (or best-effort) pods getting throttled by the scheduler?
Setting a CPU request indicates the minimum CPU required for a pod to run efficiently, while setting a CPU limit caps the maximum CPU usage. When a pod has no CPU limit (cpu.cfs_quota_us = -1
), it's not bound by the CFS quota, implying it can utilize CPU resources beyond its request, up to what's available on the node, subject to scheduling decisions.
cAdvisor's approach to not report metrics like container_cpu_cfs_periods_total
for pods without a defined CPU limit is logical because these metrics track usage against a specific limit. Since there's no limit for Pod A, the concept of CFS quota-based throttling doesn't apply directly. However, the underlying issue you're hinting at is more about CPU resource contention among pods rather than traditional CFS throttling.
To monitor situations where burstable or best-effort pods face "effective throttling" due to resource contention (as opposed to CFS quota limits), you might need a broader set of metrics. You can analyze container_cpu_usage_seconds_total
in relation to CPU requests.
Setting a CPU request indicates the minimum CPU required for a pod to run efficiently, while setting a CPU limit caps the maximum CPU usage. When a pod has no CPU limit (
cpu.cfs_quota_us = -1
), it's not bound by the CFS quota, implying it can utilize CPU resources beyond its request, up to what's available on the node, subject to scheduling decisions.cAdvisor's approach to not report metrics like
container_cpu_cfs_periods_total
for pods without a defined CPU limit is logical because these metrics track usage against a specific limit. Since there's no limit for Pod A, the concept of CFS quota-based throttling doesn't apply directly. However, the underlying issue you're hinting at is more about CPU resource contention among pods rather than traditional CFS throttling.
Our current workaround is to simply set cpu limit = number of cpus of node. That will provide perfect metrics. I wish there was a way to get them without that hack. Especially, in mixed clusters where we got different types of nodes that sometimes prevents pods from bursting.
(Theoretically, we could set limit = 1000 but in practise the cluster scheduler will prevent that.)
To monitor situations where burstable or best-effort pods face "effective throttling" due to resource contention (as opposed to CFS quota limits), you might need a broader set of metrics. You can analyze
container_cpu_usage_seconds_total
in relation to CPU requests.
Unfortunately, that did not turn as as helpful in practise. Simply, because you cannot distinguish between:
This is quite problematic/challenging when combining with vertical pod autoscalers which would always asume case (a) and, then, would not increase reservations.
Would you suggest that we should write another component to expose the same metric for our usecase? We would like to follow the recommendation to not set CPU limits unless we explicitly need them. However, this currently would limit our ability to use VPAs without massive overprovisioning and our ability to monitor for starvation.
We use the cgroups metrics that is generated by cadvisor. Through our experiments, we found that there are some missing metrics for some of our application containers. Those metrics include:
It seems that all of them are related to cpu. I wonder if I need to specify some configurations to enable those metrics per application?
cadvisor_version_info{cadvisorRevision="",cadvisorVersion="",dockerVersion="Unknown",kernelVersion="5.4.189.1-rolling-lts-linkedin",osVersion="CentOS Linux 7 (Core)"} 1