oap-project / raydp

RayDP provides simple APIs for running Spark on Ray and integrating Spark with AI libraries.
Apache License 2.0
293 stars 66 forks source link

dynamic allocation executor pending for addition flood fix #396

Closed raviranak closed 2 months ago

raviranak commented 6 months ago

Overview

Issue : Currently with dynamic allocation for executors there are too many executors pending to be added while the actual number of max executor is far less

Solution: This will check with dynamic auto scale no additional pending executor actor added more than max executors count as this result in executor even running after job completion

raviranak commented 6 months ago

@kira-lin Can you please take a look at this

kira-lin commented 6 months ago

Sorry for the late reply. I have a question: do you mean that the executors killed by Spark due to dynamic resource allocation will try to restart?

raviranak commented 5 months ago

Sorry for the late reply. I have a question: do you mean that the executors killed by Spark due to dynamic resource allocation will try to restart?

With dynamic resource allocation the amount of pending executor creation is way greater than actual number of executor , so even after completion of spark job the executor is alive for much longer

rishabh-dream11 commented 3 months ago

@kira-lin Can you help with the review and merge?