[Bug] 409 conflicts when updating status

Search before asking

[X] I searched the issues and found no similar issues.

KubeRay Component

ray-operator

What happened + What you expected to happen

KubeRay operator logs sometimes show that 409 conflicts are sometimes raised when reconciling status, for example

2022-11-17T22:14:44.198Z        ERROR   controllers.RayCluster  Update status error     {"cluster name": "prod", "error": "Operation cannot be fulfilled on rayclusters.ray.io \"prod\": the object has been modified; please apply your changes to the latest version and try again"}
github.com/ray-project/kuberay/ray-operator/controllers/ray.(*RayClusterReconciler).Reconcile
        /workspace/controllers/ray/raycluster_controller.go:97
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile
        /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.11.1/pkg/internal/controller/controller.go:114
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
        /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.11.1/pkg/internal/controller/controller.go:311
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
        /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.11.1/pkg/internal/controller/controller.go:266
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
        /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.11.1/pkg/internal/controller/controller.go:227

This happens a lot when using autoscaling, because the autoscaler writes to the RayCluster CR spec. While this is basically harmless (the operator will simply retry reconciliation), we should consider adding retry logic to the status update, at least to reduce log spam.

See https://ray-distributed.slack.com/archives/C02GFQ82JPM/p1668724199950299?thread_ts=1668711191.153149&cid=C02GFQ82JPM

Blog post for context https://alenkacz.medium.com/kubernetes-operators-best-practices-understanding-conflict-errors-d05353dff421

Reproduction script

Run an autoscaling Ray cluster, do some things to trigger autoscaling.

Anything else

No response

Are you willing to submit a PR?

[ ] Yes I am willing to submit a PR!

ray-project / kuberay

[Bug] 409 conflicts when updating status #745