kubernetes-sigs / node-feature-discovery

Node feature discovery for Kubernetes
Apache License 2.0
800 stars 245 forks source link

More places WaitForCacheSync() return value is ignored #1936

Open ahmetb opened 3 weeks ago

ahmetb commented 3 weeks ago

As reported in #1817, ignoring WaitForCacheSync() is resulting in disastrous outcomes like all node labels being removed. Failing to populate an informer cache successfully and starting the controller causes the controller to operate on incomplete desired state, and potentially causes it to perform destructive actions.

It seems despite the recent reports and fixes in here and here we've recently accepted new code that doesn't pay attention to return value from WaitForCacheSync():

https://github.com/kubernetes-sigs/node-feature-discovery/blame/1c6ce897f2b9922cec70c2c983e852a831021ef6/pkg/nfd-master/namespace-lister.go#L35-L41

I might recommend rewriting the entire controller in something like controller-runtime which has better guardrails against these sort of common pitfalls in using the informer machinery directly.

/cc @marquiz /cc @ArangoGutierrez