ray-project / kuberay

A toolkit to run Ray applications on Kubernetes
Apache License 2.0
1.16k stars 373 forks source link

[Feature] Support dynamic refresh of watched namespaces #2061

Open marton-bod opened 6 months ago

marton-bod commented 6 months ago

Search before asking

Description

Currently we use the --watch-namespace=foo,bar flag to specify the list of namespaces to watch. Extending this list with additional namespaces requires an operator restart. In large shared k8s clusters, new namespaces might be created on demand, and relatively frequently, for hosting new Ray workloads.

In this case, it would be very useful to avoid constant restarts. One solution, already used by the Apache Flink operator, is to periodically read the list of namespaces from a configmap. I would propose to implement the same solution in KubeRay. We may want to have a new flag (e.g. --namespace-cm=configmap_name), and we can decide how that would interact with the existing --watch-namespace=... flag (e.g. make them mutually exclusive, or complementary)

Use case

No response

Related issues

No response

Are you willing to submit a PR?

MortalHappiness commented 4 months ago

I would like to work on this issue.

I also have some ideas about this issue:

marton-bod commented 4 months ago

@MortalHappiness Thanks for the proposal. Feel free to go ahead with working on this, I won't have any cycles to work on this in the next few months.

MortalHappiness commented 3 months ago

After some investigation into this issue and discussion with @kevin85421, we found that controller-runtime is not able to change namespaces to watch dynamically like the Java operator SDK.

In Flink operator, it uses RegisteredController.changeNamespaces to change the namespaces to watch. See the following source code snippets for details:

In kuberay, we use DefaultNamespaces to set the namespaces to watch. However, in https://github.com/kubernetes-sigs/controller-runtime/issues/2829, one of the maintainers said that we cannot change DefaultNamespaces at runtime.