container_cpu_cfs_throttled_seconds_total / container_cpu_cfs_throttled_periods_total – This is a generic expression that will show whether or not a given container is being throttled for CPU, which will result is performance issues and service degradation.
Per Stackoverflow, it looks like the correct metric query to use would be container_cpu_cfs_throttled_periods_total / container_cpu_cfs_periods_total > 0, which gives the percentage of throttled CPU cycles for a given pod.
If I'm misunderstanding the metric produced by the query currently in documentation, it would be helpful to explain exactly what each metric in container_cpu_cfs_throttled_seconds_total / container_cpu_cfs_throttled_periods_total does, and what comparison to use when generating alerts (e.g. > 0? < 1?)
This issue has been marked as stale because of no activity in the last 180 days. It will be closed in the next 180 days unless it is tagged "no stalebot" or other activity occurs.
Version
master
Describe the requested changes
In the latest Production Deployment documentation, this PromQL query to find pods being CPU-throttled appears to use the incorrect metrics:
Per Stackoverflow, it looks like the correct metric query to use would be
container_cpu_cfs_throttled_periods_total / container_cpu_cfs_periods_total > 0
, which gives the percentage of throttled CPU cycles for a given pod.If I'm misunderstanding the metric produced by the query currently in documentation, it would be helpful to explain exactly what each metric in
container_cpu_cfs_throttled_seconds_total / container_cpu_cfs_throttled_periods_total
does, and what comparison to use when generating alerts (e.g.> 0
?< 1
?)Link to any relevant existing docs
https://docs.solo.io/gloo-edge/latest/operations/production_deployment/
Browser Information
No response
Additional Context
No response