kubernetes / client-go

Go client for Kubernetes.
Apache License 2.0
9.12k stars 2.95k forks source link

High memory when informer error occurs. #1377

Open zhulinwei opened 2 months ago

zhulinwei commented 2 months ago

I have a custrom controller that useing infomer. This controller will list and watch more than 2000 nodes and 50000 pods, api, apimachinery, client-go all at v0.24.0.

Code sample:

func main() {
    cfg, err := clientcmd.BuildConfigFromFlags("", homedir.HomeDir()+"/.kube/config")
    if err != nil {
        glog.Fatalf("1 error building kubernetes config:%v", err)
    }
    kubeClient, err := kubernetes.NewForConfig(cfg)
    if err != nil {
        glog.Fatalf("2 error building kubernetes config:%v", err)
    }
    factory := informers.NewSharedInformerFactory(kubeClient, 0)
    podInformer := factory.Core().V1().Pods().Informer()
    podInformer.AddEventHandler(cache.ResourceEventHandlerFuncs{
        AddFunc: func(obj interface{}) {
            // do something
        },
        UpdateFunc: func(oldObj, newObj interface{}) {
            // do something
        },
        DeleteFunc: func(obj interface{}) {
            // do something
        },
    })
    factory.Start(make(chan struct{}))
    factory.WaitForCacheSync(make(chan struct{}))
}

Normally only 800MB of memory is needed:

Snipaste_2024-09-20_10-26-01

But when an error occurs, the momory will be doubuled instantly, than decrease slightly, but still be higher than the memory used before the error.

Snipaste_2024-09-20_10-31-09

W0920 04:58:56.453165 1 reflector.go:442] pkg/mod/k8s.io/client-go@v0.24.0/tools/cache/reflector.go:167: watch of *v1.Node ended with: an error on the server ("unable to decode an event from the watch stream: got short buffer with n=0, base=4092, cap=40960") has prevented the request from succeeding

W0920 04:58:43.401539 1 reflector.go:442] pkg/mod/k8s.io/client-go@v0.24.0/tools/cache/reflector.go:167: watch of *v1.Pod ended with: an error on the server ("unable to decode an event from the watch stream: got short buffer with n=0, base=882, cap=20480") has prevented the request from succeeding

Memory used after error occurs: Snipaste_2024-09-20_10-34-30

As I understand, when a network anomaly occurs, informer will re-pull the full configuration of above resources from kube-apiserver. At this time, because old and new resources object exist at the same time, the memory will surge, then gc will be performed after a period of time to recycle old resources object, and the memory will fall back. But I don't understand why the meomry would be more than before error occurred.

Is this a bug? How can i fix it? What should I do?