Kubernetes CPU limits are... unintuitive by default. There are options to bind specific cores to containers if enabled (I think in Kubelet itself?) if requested in integral quantities, but this is NOT the default (and AFAIK not configurable on most public cloud providers like GKE).
The default behavior is to throttle after the CPU allocation is exceeded, which leads to terrible tail latencies. For example, a container with CPU limit 2 can use 4 cores for half a second and will then be throttled for the remaining half second.
https://github.com/yugabyte/charts/blob/6e6ea5a2850ed8522fe3e4f1fb09cd25c15aecfb/stable/yugabyte/values.yaml#L21-L35
Kubernetes CPU limits are... unintuitive by default. There are options to bind specific cores to containers if enabled (I think in Kubelet itself?) if requested in integral quantities, but this is NOT the default (and AFAIK not configurable on most public cloud providers like GKE).
The default behavior is to throttle after the CPU allocation is exceeded, which leads to terrible tail latencies. For example, a container with CPU limit 2 can use 4 cores for half a second and will then be throttled for the remaining half second.
https://medium.com/@betz.mark/understanding-resource-limits-in-kubernetes-cpu-time-9eff74d3161b
As far as I can tell, NOT setting CPU limits is considered best practice at this point.
Thanks for the amazing work!