k8snetworkplumbingwg / whereabouts

A CNI IPAM plugin that assigns IP addresses cluster-wide
Apache License 2.0
273 stars 120 forks source link

Increase RequestTimout to fix overlappingIP context deadline error #478

Closed smoshiur1237 closed 1 week ago

smoshiur1237 commented 1 month ago

What this PR does / why we need it: ListOverlappingIPs function fails with error :failed to list all OverLappingIPs: client rate limiter Wait returned an error: context deadline exceeded . Also, DeleteOverlappingIP function also unable to delete unused overlapping after scale down of pods.

kubectl get ippools.whereabouts.cni.cncf.io -n kube-system  197.0.0.0-8 -o yaml | grep -c podref
143

Here with the test case we are having 500 pods with overlapping IP feature enabled, during scale down of pods from 500 to 1, we are getting the error with context deadline exceed and also see undeleted pod reference. So basically it is an issue with the RequestTimeout where it was having 10s timeout. The client have default 5qps and for 500 pods, it needs 100s to send all the query. Because of 10s request timeout used in overlapping ip list and deletion, it gets timed out and giving the following error. The modification of this timeout from 10s to 100s will not change the basic functionality but adding more time to process the query and deletion.

Which issue(s) this PR fixes: Fixes #389

smoshiur1237 commented 1 month ago

/cc @dougbtv @manuelbuil Please review and this should fix the issue.

smoshiur1237 commented 1 month ago

/cc @mlguerrero12 Please take a look

mlguerrero12 commented 1 month ago

We have a customer reporting this issue with 100 nodes and 30k pods. What you're proposing might work for 500 pods but not for 30k.

smoshiur1237 commented 1 month ago

We have a customer reporting this issue with 100 nodes and 30k pods. What you're proposing might work for 500 pods but not for 30k.

Here the error and podref issue is coming up because of the timeout. Yes 100s timeout will work with 500pods incase of listing and deletion of overlappingIP. Yes it may not work for 30k pods. Do you have any suggestion how to handle this?

smoshiur1237 commented 1 week ago

New fix PR is up #480, so closing this PR