project-codeflare / codeflare-cli

Apache License 2.0
11 stars 12 forks source link

Ray cluster is not queued with MCAD #725

Open Sara-KS opened 1 year ago

Sara-KS commented 1 year ago

I am trying to set up a Ray cluster with 1 head node and 1 worker node using the MCAD option. When I get to the point where I try to launch Ray with the CodeFlare CLI I run into this streaming output

✔  Choice 9  Choose a Pod Scheduler  · Use the Multi-user Enhanced Kubernetes Scheduler
✔  Choice 10  Choose Pod Scheduler for MCAD  · My administrator has already installed and configured MCAD
▶ Stream out Events from the Ray Head Node
No resources found in preprocessing-pipelines namespace.
Waiting for Ray Head node
No resources found in preprocessing-pipelines namespace.
Waiting for Ray Head node 
...

In a separate terminal window, oc get appwrappers returns with no app wrappers listed. This same results happens regardless of how I configure the Ray cluster resources in the Ray Resource Requirements step.

starpit commented 1 year ago

thanks for the bug report. what does helm ls -n preprocessing-pipelines show? there is a known defect where if a helm chart is leftover from a failed prior startup, it blocks a fresh redeploy.

Sara-KS commented 1 year ago

Thanks Nick, that was exactly it. When I uninstalled the previous helm chart it was able to make progress

starpit commented 1 year ago

thanks for checking that out! i'll reopen this, since it is a bug we should fix!

starpit commented 1 year ago

oh, and a bit of fyis from my prior investigations on this: a failed helm install will actually leave around a helm chart, but in a broken state (rather than just leaving around nothing except errors on your console). nice.

then, the guidebook for this only checks to see if the helm chart is installed (not its state). double nice.

Sara-KS commented 1 year ago

Good to know! I will keep that in mind and post here if I run into a related issue.