kubevirt / hyperconverged-cluster-operator

Operator pattern for managing multi-operator products
Apache License 2.0
155 stars 153 forks source link

Expose new APIs to adjust KubeVirt rate limiting #2152

Closed jcanocan closed 1 year ago

jcanocan commented 2 years ago

Is this a BUG REPORT or FEATURE REQUEST?:

Uncomment only one, leave it on its own line:

/kind bug

/kind enhancement

What happened: User expects to run as many VMs in a OpenShift cluster as pods. However, Kubernetes API clients comes a token bucket rate limiter which avoids to congest the kube-apiserver bandwidth. The token bucket rate limiter is configurable through bust and Query Per Second (QPS) parameters. These parameters limit the total amount of VMs that may be scheduled in the cluster. Also, it is required to increase the number of virt-api replicas to support the extra requests.

As it is now, users need to apply the following patch to increase the limits:

$ oc patch hco -n openshift-cnv kubevirt-hyperconverged --type=merge -p '{"metadata":{"annotations":{"kubevirt.kubevirt.io/jsonpatch":"[{\"op\":\"add\",\"path\":\"/spec/configuration\",\"value\":{\"apiConfiguration\":{\"restClient\":{\"rateLimiter\":{\"tokenBucketRateLimiter\":{\"burst\":200,\"qps\":100}}}},\"controllerConfiguration\":{\"restClient\":{\"rateLimiter\":{\"tokenBucketRateLimiter\":{\"burst\":400,\"qps\":200}}}},\"handlerConfiguration\":{\"restClient\":{\"rateLimiter\":{\"tokenBucketRateLimiter\":{\"burst\":400,\"qps\":200}}}},\"webhookConfiguration\":{\"restClient\":{\"rateLimiter\":{\"tokenBucketRateLimiter\":{\"burst\":400,\"qps\":200}}}}}}]"}}}'

What you expected to happen: I would be more clear and easy for users if the would be an API to set the values, e.g.

Kind: HCO
Spec:
  tuning:
    Static:  # Easy
       burst: 200
       qps: 100

or

Kind: HCO
Spec:
  tuning:
    Static:  # Easy
      apiConfiguration:
         burst: 200
         qps: 100
       controllerConfiguration:
         burst: 400
         qps: 200
       handlerConfiguration:
         burst: 400
         qps: 200
       webhookConfiguration:
         burst: 400
         qps: 200

I'm not sure which one would be more appropriated. Also, regarding the replica set increasing I'm not sure if it would be good to include it in the same configuration file or just leave it as a separate patch. Another possibility is that this is no longer required because of this.

Issue tracked in: CNV-21596 How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:

Environment:

orenc1 commented 2 years ago

Wouldn't it be better that kubevirt operator itself will dynamically adjust the API rate limiter values according to the actual load and capacity available in the cluster? How would the average user decide what are the appropriate values to put there, if the API will be exposed via HCO?

cc: @dankenigsberg

tiraboschi commented 2 years ago

I definitely agree: level V of the operators capability model is Auto Pilot: Horizontal/vertical scaling, auto config tuning, abnormal detection, scheduling tuning

and this looks exactly like something that should be internally handled by the operator and only eventually exposed to the cluster admin only for corner cases.

jcanocan commented 2 years ago

Thanks for your feedback! Please note that the final aim is to dynamically adjust these values, this effort is being tracked on CNV-14170.

By now, the main idea is to provide an easy way to adjust those values by cluster admins for covering corner cases. Note also that this is currently being done by cluster admins, but requires applying the above patch, which is complicated to follow and tune in my humble opinion.

orenc1 commented 2 years ago

If it's just a temporary solution until we have the dynamic adjustment implemented, I don't see a reason to invest in exposing this API temporarily. What we can do is to document the above jsonpatch annotation to tune the rate limiter configuration, in: https://github.com/kubevirt/hyperconverged-cluster-operator/blob/main/docs/cluster-configuration.md

what do you think?

dominikholler commented 2 years ago

If it's just a temporary solution until we have the dynamic adjustment implemented, I don't see a reason to invest in exposing this API temporarily.

Is there any chance to have this API available, but marked as internal? This way we would be not forced to support it in future versions, but could reuse it to implement the auto adjustment.

What we can do is to document the above jsonpatch annotation to tune the rate limiter configuration, in: https://github.com/kubevirt/hyperconverged-cluster-operator/blob/main/docs/cluster-configuration.md

@jeniferh wrote documentation. But the way the resulting documentation was looking, motivated us the have this improvement.

dominikholler commented 1 year ago

Let's try to avoid the 'static', but use a policy instead. The first policy will be very simple, just setting the values from a config map:

Kind: HCO
Spec:
  tuning:
    policy:
      # StaticPolicy should reference to a config map
      name: StaticPolicy