ray-project / kuberay

A toolkit to run Ray applications on Kubernetes
Apache License 2.0
982 stars 330 forks source link

[RayCluster][Fix] evicted head-pod can be recreated or restarted #2217

Open JasonChen86899 opened 2 days ago

JasonChen86899 commented 2 days ago

Why are these changes needed?

This PR attempts to fix issues https://github.com/ray-project/kuberay/issues/2125 if head pod has been evicted, we will delete it and let it restart or recreate

Related issue number

https://github.com/ray-project/kuberay/issues/2125

Checks

kevin85421 commented 2 days ago

Hey @JasonChen86899, I didn't know that you wanted to work on the issue. I have already assigned the issue to @MortalHappiness before you open this PR. Maybe we can find other issues to collaborate on if you are interested in contributing to KubeRay? Sorry for the inconvenience.

MortalHappiness commented 2 days ago

@kevin85421 I am OK that if @JasonChen86899 wants to create a PR.

kevin85421 commented 2 days ago

@MortalHappiness Thanks!

JasonChen86899 commented 2 days ago

Hey @JasonChen86899, I didn't know that you wanted to work on the issue. I have already assigned the issue to @MortalHappiness before you open this PR. Maybe we can find other issues to collaborate on if you are interested in contributing to KubeRay? Sorry for the inconvenience.

@kevin85421 Sorry, I just made a draft and didn't notice that it was assigned, I have closed it. cc @MortalHappiness

kevin85421 commented 2 days ago

@JasonChen86899 No worries. I have already synced with @MortalHappiness. He is comfortable with your PR. We can review this PR together. You don't need to close it.