oap-project / raydp

RayDP provides simple APIs for running Spark on Ray and integrating Spark with AI libraries.
Apache License 2.0
293 stars 66 forks source link

Feat/recover lost executors #387

Closed KiranP-d11 closed 8 months ago

pang-wu commented 8 months ago

Just curious, will this PR resolve https://github.com/oap-project/raydp/issues/364 ? Also there are conflict with master, can you rebase @KiranP-d11 ?

KiranP-d11 commented 8 months ago

@pang-wu Yes, its for the issue https://github.com/oap-project/raydp/issues/364. Currently this PR has multiple changes i.e fixes for lost executors and dynamic core resizing (inspired from here).

I will separate this into multiple PRs and will update this PR to have only the changes for recovering lost executors.