Operator's log is flooded with ROOK_WATCH_FOR_NODE_FAILURE= messages

h323 commented 8 months ago

Is this a bug report or feature request?

Bug Report

Deviation from expected behavior:

Operator's log is flooded with messages like this when logging level is set to "Info":

2024-02-06 09:39:55.305431 I | op-k8sutil: ROOK_WATCH_FOR_NODE_FAILURE="true" (default)

I have Rook installed on a pretty large cluster, and I see hundreds of these messages every hour.

Expected behavior:

ROOK_WATCH_FOR_NODE_FAILURE= messages are not printed to the log when logging level is set to "Info".

How to reproduce it (minimal and precise):

The message is printed from the k8sutil.GetOperatorSetting function, which is called from the handleNodeFailure function (added in #12286), which in turn is called from the onK8sNode function, which is triggered when any of the controllers receives a node add/update event.

Environment:

OS (e.g. from /etc/os-release): Ubuntu 20.04.6 LTS
Kernel (e.g. uname -a): 5.10.0-1057-oem
Cloud provider or hardware configuration: Y4N-GA1-TY25-ZB0 server
Rook version (use rook version inside of a Rook Pod): rook: v1.12.2, go: go1.20.7
Storage backend version (e.g. for ceph do ceph -v): ceph version 17.2.6 (d7ff0d10654d2280e08f1ab989c7cdf3064446a5) quincy (stable)
Kubernetes version (use kubectl version): v1.26.7
Kubernetes cluster type (e.g. Tectonic, GKE, OpenShift): kubernetes
Storage backend status (e.g. for Ceph use ceph health in the Rook Ceph toolbox): HEALTH_OK

Madhu-1 commented 8 months ago

@h323 This is fixed and backported to 1.12.3 https://github.com/rook/rook/pull/12679. can you please check with Rook 1.12.3 release? cc @subhamkrai

subhamkrai commented 8 months ago

@h323 I see you are using 1.12.3, please upgrade to a newer version to get the fix. Thanks

h323 commented 8 months ago

Upgrading to 1.12.3 fixed the issue, thanks!

dimm0 commented 4 months ago

I see it in v1.12.11. Any way to fix?

travisn commented 4 months ago

@dimm0 This must be coming from here where the rook-ceph-operator-config configmap is not found. Do you have that configmap? A number of releases ago it was added here. If not found, Rook will fall back to the operator env vars. It's best to move your operator env vars into this configmap, then it should solve this logging issue as well.

rook / rook

Operator's log is flooded with ROOK_WATCH_FOR_NODE_FAILURE= messages #13704