project-codeflare / codeflare-operator

Operator for installation and lifecycle management of CodeFlare distributed workload stack
Apache License 2.0
7 stars 41 forks source link

Downgrade StatusReasonConflict errors to debug messages #603

Closed tardieu closed 1 month ago

tardieu commented 1 month ago

The codeflare-operator log is littered with update conflict errors such as: 2024-07-24T13:06:33Z ERROR Reconciler error {"controller": "AppWrapper", "controllerGroup": "workload.codeflare.dev", "controllerKind": "AppWrapper", "AppWrapper": {"name":"kevin1-team-hw","namespace":"kevin1-team"}, "namespace": "kevin1-team", "name": "kevin1-team-hw", "reconcileID": "b6e57167-a357-4c67-85d1-f455e2b57ab6", "error": "Operation cannot be fulfilled on appwrappers.workload.codeflare.dev \"kevin1-team-hw\": the object has been modified; please apply your changes to the latest version and try again"}

These update conflicts result from trying to update stale Kubernetes object revisions in etcd when multiple reconciliers (or users) are concurrently working on cached copies of these objects. These conflicts are harmless. They are handled by retrying the reconciliation loop, refreshing the cached object, and updating or patching the more recent revision. This process is entirely handled by the controller runtime but it involves returning the conflict error to the controller runtime to trigger these retries. Unfortunately, the controller runtime as a result unconditionally logs these harmless conflicts as errors, which is confusing users.

This PR therefore wraps the controller runtime logger with a filter that downgrades these log messages from ERROR to DEBUG messages, more accurately matching the gravity of the event.

astefanutti commented 1 month ago

/lgtm

astefanutti commented 1 month ago

/approve

openshift-ci[bot] commented 1 month ago

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: astefanutti

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files: - ~~[OWNERS](https://github.com/project-codeflare/codeflare-operator/blob/main/OWNERS)~~ [astefanutti] Approvers can indicate their approval by writing `/approve` in a comment Approvers can cancel approval by writing `/approve cancel` in a comment