[Feature] Support dynamic refresh of watched namespaces

marton-bod commented 6 months ago

Search before asking

[X] I had searched in the issues and found no similar feature requirement.

Description

Currently we use the --watch-namespace=foo,bar flag to specify the list of namespaces to watch. Extending this list with additional namespaces requires an operator restart. In large shared k8s clusters, new namespaces might be created on demand, and relatively frequently, for hosting new Ray workloads.

In this case, it would be very useful to avoid constant restarts. One solution, already used by the Apache Flink operator, is to periodically read the list of namespaces from a configmap. I would propose to implement the same solution in KubeRay. We may want to have a new flag (e.g. --namespace-cm=configmap_name), and we can decide how that would interact with the existing --watch-namespace=... flag (e.g. make them mutually exclusive, or complementary)

Use case

No response

Related issues

No response

Are you willing to submit a PR?

[X] Yes I am willing to submit a PR!

MortalHappiness commented 4 months ago

I would like to work on this issue.

I also have some ideas about this issue:

Instead of making this new ConfigMap only serve a single purpose of dynamically change the watched namespaces, I think it is better to have a flag like --configmap, which serves as all the operator configs that need to be constantly watched for changes. In this way, if we are trying to support dynamic change of other configs in the future, we don't need to introduce another CLI flag.
Instead of polling the status of the configmap in a constant interval like the Flink operator does, we can watch it with kubebuilder just like we watch CRD. See https://book.kubebuilder.io/reference/watching-resources/externally-managed for details.
Assume that the namespace of this configmap is the same as the namespace of the operator pod. If the operator is run outside of the cluster, assume the namespace of this configmap is default.
If users set both watched namespaces in this configmap and in the operator CLI, the value in the configmap is ignored. This idea comes from that many CLI tools allows override config file values using the CLI arguments. For example, in helm you can have a values.yaml and override its value from the command line using --set.

marton-bod commented 4 months ago

@MortalHappiness Thanks for the proposal. Feel free to go ahead with working on this, I won't have any cycles to work on this in the next few months.

MortalHappiness commented 3 months ago

After some investigation into this issue and discussion with @kevin85421, we found that controller-runtime is not able to change namespaces to watch dynamically like the Java operator SDK.

In Flink operator, it uses RegisteredController.changeNamespaces to change the namespaces to watch. See the following source code snippets for details:

In kuberay, we use DefaultNamespaces to set the namespaces to watch. However, in https://github.com/kubernetes-sigs/controller-runtime/issues/2829, one of the maintainers said that we cannot change DefaultNamespaces at runtime.

ray-project / kuberay