Open Nintorac opened 2 years ago
To enable remote kernels looks fairly straight forward. I think it should be possible to use a ray remote actor to proxy the necessary ports from the ray client to the notebook host.
Then initiating the ray proxy will require wrapping the IPython.kernel
call in kernel.json
and generating the connection file on the fly, resource requests can then just be part of the args for the IPython.kernel
wrapper
https://ipython.org/ipython-doc/dev/development/kernels.html#kernel-specs
Should kernels be specified as part of the kuberay.ray.io/v1alpha1.Notebook
spec?
Isolation is more tricky I think, ray has namespaces but as far as I know these still share the filesystem so anything written can be read by other ray jobs.
I think the best way may be to rely on k8s and running kernels as their own pods. Then custom add a custom ray resource isolated
and have the autoscaler create a pod whose lifetime is tied to the ray function. Maybe this should be broken to another issue though?
Search before asking
Description
Allow
kuberay.ray.io/v1alpha1.Notebook
instances to launch remote kernels running in ray itself allowing specification of resource requests in the usual ray wayUse case
I want to make as efficient use of compute resources as possible. In our current situation using sagemaker notebooks there is a lot of waste. If all kernels are shutdown the instance is still using valuable resources that could be better utilized.
Conversely if the instance is shutdown and the user want's to get access to resources the warm up time can be upwards of minutes, this is wasting dev time which is expensive.
Ideally I could start a remote kernel and have it instantly be running (preempting batch jobs if necessary) and as soon as I close it the resources are returned to the cluster.
There should be strong isolation guarantees between different users.
Related issues
103
Are you willing to submit a PR?