Closed giannoul closed 3 years ago
So the important thing here is to increase cfg.QPS so we can handle higher load without hitting the throttling limits. Burst is not as important until we run many routers, which makes cfg.Burst important. Is this the reason why you prefer to set cfg.Burst set to 0 when default in #60 ?
If it's zero, the created RESTClient will use DefaultBurst: 10.
The main idea behind the issue and fix in #60 is to be able to set the cfg.QPS
and cfg.Burst
using environment variables. In many cases the default values of 0 for both of them (resulting to cfg.QPS = 5
and cfg.Burst = 10
as per the docs) will do just fine, but in some edge cases like e.g. many certificates they will lead to significant delays due to throttling.
The implementation in #60 will read the environment variables for cfg.QPS
(named RATE_LIMIT_QPS
) and cfg.Burst
(named RATE_LIMIT_BURST
) and do the appropriate settings. Any of cfg.QPS
and cfg.Burst
will default to 0 if the relevant environment variable is not set. This was done in order to ensure that the changes will take effect only if we decide to set the environment variable.
@giannoul , should we add some helm chart values in the values.yaml to be set on the deployment object? If they are not passed in via values, then we can default to the safe defaults of ""
before we close this issue out.
https://github.com/teamhephy/router/blob/master/charts/router/values.yaml https://github.com/teamhephy/router/blob/master/charts/router/templates/router-deployment.yaml#L58
@Cryptophobia I have created the PR https://github.com/teamhephy/router/pull/61 that should handle the above.
The logic is that we can set for example the values:
rate_limit:
qps: "50.0"
burst: "50"
in values.yaml and they will populate the deployment's environment variables. Since a non existing environment variable is set to ""
by default by os.Getenv it will prevent the setting of relevant configuration variable, leading to the usage of the default values (the default will be 0).
The reason I chose qps
and burst
to be fields of the same basic variable rate_limit
is that they are connected in the way that someone will set them. For example, if we only set qps
the configuration will fail with message:
Failed to create client: Burst is required to be greater than 0 when RateLimiter is not set and QPS is set to greater than 0.
So, someone should keep in mind that those variables are related.
Great idea to link the settings under the same basic variable rate_limit
. I agree that if they need to be set together, this is best practice!
There are cases where the router config reload takes over a minute due to the fact that its requests are being throttled by k8s.
The router uses the cfg, err := rest.InClusterConfig(). As per InClusterConfig:
The Config returned by the
InClusterConfig
has the default values forQPS
andBurst
(they are both equal to 0):The above means that when router tries to fetch the certificates it will be throttled. In the case of a big number of certificates this leads to a significant delay during configuration reloads. In a specific case with ~200 certificates the delay is almost 120 seconds due to the throttling. The situation gets worse when multiple hephy routers run as part of the deployment, due to the fact that they use the same Service Account, so they all try to make requests to the control plane.