Incorrect prometheus query in Production Deployment docs

Version

master

Describe the requested changes

In the latest Production Deployment documentation, this PromQL query to find pods being CPU-throttled appears to use the incorrect metrics:

container_cpu_cfs_throttled_seconds_total / container_cpu_cfs_throttled_periods_total – This is a generic expression that will show whether or not a given container is being throttled for CPU, which will result is performance issues and service degradation.

Per Stackoverflow, it looks like the correct metric query to use would be container_cpu_cfs_throttled_periods_total / container_cpu_cfs_periods_total > 0, which gives the percentage of throttled CPU cycles for a given pod.

If I'm misunderstanding the metric produced by the query currently in documentation, it would be helpful to explain exactly what each metric in container_cpu_cfs_throttled_seconds_total / container_cpu_cfs_throttled_periods_total does, and what comparison to use when generating alerts (e.g. > 0? < 1?)

Link to any relevant existing docs

https://docs.solo.io/gloo-edge/latest/operations/production_deployment/

Browser Information

No response

Additional Context

No response

solo-io / gloo