Open michaely-cb opened 5 months ago
Hi @dougbtv @s1061123. Can I get a review on this PR please? Thanks!
This sure sounds like an excellent fix, and overall I'm in favor of it -- is there any way that we can validate that it does indeed operate as expected by reducing the API calls? e.g. via end to end tests, or, even manually? thanks!
is there any way that we can validate that it does indeed operate as expected by reducing the API calls? e.g. via end to end tests, or, even manually?
What I did is to manually turn on the API server audit logs and see the call pattern changes. I have captured the call patterns before and after in the PR description, where we could see the minutely reconnections were happening prior to this change and not after. In the later calls, we can also see the reconnection time aligns with the random timeout client-go was specifying in the request parameters.
@dougbtv Mind taking another look and rerun CI please?
This pull request is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 7 days.
Hi @dougbtv @s1061123. Gentle bump on reviewing this PR? Thanks!
The watch calls from multus were reconnecting to the API server every minute, due to a one-minute timeout specified on the rest config. Reconnecting every minute imposes unnecessary load on the api server and watches with fixed timeouts won't be temporally staggered to make the api server load even. For watch calls, we should completely delegate the reconnections to client-go. Watches from other components (kubelet, kube-scheduler, cilium) are doing this delegation already.
Reference: https://github.com/kubernetes/client-go/blob/03443e7ede0e50d195b8669103ce082e735c6b94/tools/cache/reflector.go#L52-L56
Pod watch:
Nad watch: