Closed jcanocan closed 1 year ago
Wouldn't it be better that kubevirt operator itself will dynamically adjust the API rate limiter values according to the actual load and capacity available in the cluster? How would the average user decide what are the appropriate values to put there, if the API will be exposed via HCO?
cc: @dankenigsberg
I definitely agree: level V of the operators capability model is Auto Pilot: Horizontal/vertical scaling, auto config tuning, abnormal detection, scheduling tuning
and this looks exactly like something that should be internally handled by the operator and only eventually exposed to the cluster admin only for corner cases.
Thanks for your feedback! Please note that the final aim is to dynamically adjust these values, this effort is being tracked on CNV-14170.
By now, the main idea is to provide an easy way to adjust those values by cluster admins for covering corner cases. Note also that this is currently being done by cluster admins, but requires applying the above patch, which is complicated to follow and tune in my humble opinion.
If it's just a temporary solution until we have the dynamic adjustment implemented, I don't see a reason to invest in exposing this API temporarily. What we can do is to document the above jsonpatch annotation to tune the rate limiter configuration, in: https://github.com/kubevirt/hyperconverged-cluster-operator/blob/main/docs/cluster-configuration.md
what do you think?
If it's just a temporary solution until we have the dynamic adjustment implemented, I don't see a reason to invest in exposing this API temporarily.
Is there any chance to have this API available, but marked as internal? This way we would be not forced to support it in future versions, but could reuse it to implement the auto adjustment.
What we can do is to document the above jsonpatch annotation to tune the rate limiter configuration, in: https://github.com/kubevirt/hyperconverged-cluster-operator/blob/main/docs/cluster-configuration.md
@jeniferh wrote documentation. But the way the resulting documentation was looking, motivated us the have this improvement.
Let's try to avoid the 'static', but use a policy instead. The first policy will be very simple, just setting the values from a config map:
Kind: HCO
Spec:
tuning:
policy:
# StaticPolicy should reference to a config map
name: StaticPolicy
Is this a BUG REPORT or FEATURE REQUEST?:
/kind enhancement
What happened: User expects to run as many VMs in a OpenShift cluster as pods. However, Kubernetes API clients comes a token bucket rate limiter which avoids to congest the kube-apiserver bandwidth. The token bucket rate limiter is configurable through
bust
andQuery Per Second (QPS)
parameters. These parameters limit the total amount of VMs that may be scheduled in the cluster. Also, it is required to increase the number ofvirt-api
replicas to support the extra requests.As it is now, users need to apply the following patch to increase the limits:
$ oc patch hco -n openshift-cnv kubevirt-hyperconverged --type=merge -p '{"metadata":{"annotations":{"kubevirt.kubevirt.io/jsonpatch":"[{\"op\":\"add\",\"path\":\"/spec/configuration\",\"value\":{\"apiConfiguration\":{\"restClient\":{\"rateLimiter\":{\"tokenBucketRateLimiter\":{\"burst\":200,\"qps\":100}}}},\"controllerConfiguration\":{\"restClient\":{\"rateLimiter\":{\"tokenBucketRateLimiter\":{\"burst\":400,\"qps\":200}}}},\"handlerConfiguration\":{\"restClient\":{\"rateLimiter\":{\"tokenBucketRateLimiter\":{\"burst\":400,\"qps\":200}}}},\"webhookConfiguration\":{\"restClient\":{\"rateLimiter\":{\"tokenBucketRateLimiter\":{\"burst\":400,\"qps\":200}}}}}}]"}}}'
What you expected to happen: I would be more clear and easy for users if the would be an API to set the values, e.g.
or
I'm not sure which one would be more appropriated. Also, regarding the replica set increasing I'm not sure if it would be good to include it in the same configuration file or just leave it as a separate patch. Another possibility is that this is no longer required because of this.
Issue tracked in: CNV-21596 How to reproduce it (as minimally and precisely as possible):
Anything else we need to know?:
Environment:
oc get csv -n kubevirt-hyperconverged
): 4.11.0kubectl version
): 1.24.0