volcano-sh / volcano

A Cloud Native Batch System (Project under CNCF)
https://volcano.sh
Apache License 2.0
4.23k stars 968 forks source link

If we use reclaim or preempt, spark driver will be reclaimed, it's irrational #2734

Open zbbkeepgoing opened 1 year ago

zbbkeepgoing commented 1 year ago

What happened:

If enable queue's reclaimable, pending task in high priority queue will reclaim some special task in low priority queue, such as spark driver, If spark driver is be reclaimed or preempted. the spark job will fail.

What you expected to happen:

Avoid reclaim or preempt some special task.

How to reproduce it (as minimally and precisely as possible):

  1. Create two queue, the resource of queue need overlap.
  2. Submit first spark job to one queue, this spark job need full of the reources of queue.
  3. After all executor of first spark job is running, submit second spark job to another queue, this spark job also need full of the reources of queue.
  4. The task in second spark job will reclaim the driver of the first spark job

Anything else we need to know?:

If spark driver is terminating, spark job will fail, if spark executor is terminating, spark will retry.

Environment:

zbbkeepgoing commented 1 year ago

/assign @zbbkeepgoing

stale[bot] commented 1 year ago

Hello 👋 Looks like there was no activity on this issue for last 90 days. Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗 If there will be no activity for 60 days, this issue will be closed (we can always reopen an issue if we need!).

zbbkeepgoing commented 1 year ago

keep active

stale[bot] commented 1 year ago

Hello 👋 Looks like there was no activity on this issue for last 90 days. Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗 If there will be no activity for 60 days, this issue will be closed (we can always reopen an issue if we need!).