project-codeflare / codeflare-sdk

An intuitive, easy-to-use python interface for batch resource requesting, access, job submission, and observation. Simplifying the developer's life while enabling access to high-performance compute resources, either in the cloud or on-prem.
Apache License 2.0
22 stars 41 forks source link

Connect to an interactive Ray session from outside the cluster #53

Closed MichaelClifford closed 1 year ago

MichaelClifford commented 1 year ago

As a user of the Codeflare stack, I want the ability to connect to an interactive Ray session through a ray.init() call from outside of the the cluster.

tedhtchang commented 1 year ago

Tried the following for the this assumes users will use a local notebook not notebook container

  1. oc port-forward the svc/hfgputest-head-svc:10001 : worked for local notebook but not notebook container
  2. NodePort worked: but range of valid ports is 30000-32767 and dashboard route stopped working
  3. New route to ray-dashboard-hfgputest targeting 10001 port (Not working)
tedhtchang commented 1 year ago

I have the added a manually generated appWrapper to enable the interactive Ray Session. Let's decide the solution should be implement in the cluster.up(). In short, the solution is to route the ray client(gRPC/http2 based protocol) externally via TLS enabled Route(Openshift limitation). The solution is based on the finding [1][2].

[1] gRPC or HTTP/2 Ingress Connectivity in OpenShift [2] Ray TLS authentication

MichaelClifford commented 1 year ago

Can we add a parameter, "TLS", to the ClusterConfig that will add the appropriate additions into the appwrapper yaml if set to True?

tedhtchang commented 1 year ago

@MichaelClifford Sounds like a good idea although this solution doesn't encrypt the Ray Dashboard. Perhaps more explicit local_interactive = true Let's review the appwrapper and make sure it does not run into scalability issue due the additional secret and initContainers. Then decide the appropriate name of the parameter.

tedhtchang commented 1 year ago

The decision was that we should create a new template to enabling the TLS feature since there is performance hit. We should let the users specify whether to enable with a parameter. i.e. TLS=True

MichaelClifford commented 1 year ago

Closed by #100