Closed KiranP-d11 closed 6 months ago
Hi @KiranP-d11 This is great. Thanks for your work! I have already merged the pr which fixes raydp-submit, can you please merge the main branch and try CI again? The file changes LGTM to me
@kira-lin Merged the master branch and CI checks are passing.
Pull request for the bug described in the issue https://github.com/oap-project/raydp/issues/364.
This is the fix for ray executors not recovering from OOM and other failures. The issue is because of the race condition: