kudobuilder / kudo

Kubernetes Universal Declarative Operator (KUDO)
https://kudo.dev
Apache License 2.0
1.17k stars 103 forks source link

Reduce Log Verbosity for HealthChecks #1740

Open kensipe opened 3 years ago

kensipe commented 3 years ago

HealthChecks are extremely verbose... for just 1 installed operator... it will be worse for many.

we should either:

  1. reduce log level verbosity for certain messages (unknown types in particular), default log level should remove these messages
  2. define specific "known" unknown types :) which we flag healthy without logs or some combo

example:

2020/11/13 18:12:04 InstanceController: Received Reconcile request for instance default/cassandra-instance
2020/11/13 18:12:04 InstanceController: Going to proceed with execution of the scheduled plan 'deploy' on instance default/cassandra-instance
2020/11/13 18:12:04 HealthUtil: unknown type *v1.ConfigMap is marked healthy by default
2020/11/13 18:12:04 HealthUtil: unknown type *v1.ConfigMap is marked healthy by default
2020/11/13 18:12:04 HealthUtil: service default/cassandra-instance-svc is marked healthy
2020/11/13 18:12:04 HealthUtil: unknown type *v1.Secret is marked healthy by default
2020/11/13 18:12:04 HealthUtil: unknown type *v1.ConfigMap is marked healthy by default
2020/11/13 18:12:04 HealthUtil: unknown type *v1.RoleBinding is marked healthy by default
2020/11/13 18:12:04 HealthUtil: unknown type *v1.ServiceAccount is marked healthy by default
2020/11/13 18:12:04 HealthUtil: unknown type *v1.Role is marked healthy by default
2020/11/13 18:12:04 HealthUtil: unknown type *v1beta1.PodDisruptionBudget is marked healthy by default
2020/11/13 18:12:04 HealthUtil: unknown type *v1.ConfigMap is marked healthy by default
2020/11/13 18:12:04 HealthUtil: unknown type *v1.ConfigMap is marked healthy by default
2020/11/13 18:12:04 HealthUtil: statefulset "cassandra-instance-node" is not healthy: Waiting for 1 pods to be ready...
2020/11/13 18:12:04 TaskExecution: object default/cassandra-instance-node is NOT healthy: statefulset "cassandra-instance-node" is not healthy: Waiting for 1 pods to be ready...
2020/11/13 18:12:04 PlanExecution: 'node' task(s) (instance: default/cassandra-instance) of the deploy.nodes.node are not ready
2020/11/13 18:12:04 PlanExecution: 'node' step(s) (instance: default/cassandra-instance) of the deploy.nodes are not ready
2020/11/13 18:12:04 InstanceController: Error when updating instance status. Operation cannot be fulfilled on instances.kudo.dev "cassandra-instance": the object has been modified; please apply your changes to the latest version and try again
2020/11/13 18:12:04 InstanceController: Error when updating instance default/cassandra-instance. Operation cannot be fulfilled on instances.kudo.dev "cassandra-instance": the object has been modified; please apply your changes to the latest version and try again
2020/11/13 18:12:05 InstanceController: Received Reconcile request for instance default/cassandra-instance
2020/11/13 18:12:05 InstanceController: Going to proceed with execution of the scheduled plan 'deploy' on instance default/cassandra-instance
2020/11/13 18:12:05 HealthUtil: unknown type *v1.Secret is marked healthy by default
2020/11/13 18:12:05 HealthUtil: unknown type *v1.ConfigMap is marked healthy by default
2020/11/13 18:12:05 HealthUtil: unknown type *v1.ConfigMap is marked healthy by default
2020/11/13 18:12:05 HealthUtil: unknown type *v1.ConfigMap is marked healthy by default

I'm not sure the value in logging Secrets, ConfigMaps, ServiceAccounts and Roles... if we always know they are "healthy" then we don't need a log to tell us that... repeatedly.