volcano-sh / volcano

A Cloud Native Batch System (Project under CNCF)
https://volcano.sh
Apache License 2.0
4.25k stars 971 forks source link

fix: mark PodGroup completed when pod fails #3807

Open bood opened 2 weeks ago

bood commented 2 weeks ago

Similar to api.Succeeded, when pods are in api.Failed state, PodGroup should also be marked as Completed

volcano-sh-bot commented 2 weeks ago

Welcome @bood!

It looks like this is your first PR to volcano-sh/volcano.

Thank you, and welcome to Volcano. :smiley:

bood commented 2 weeks ago

/assign @lowang-bh

lowang-bh commented 2 weeks ago

Thanks for your contributions. How about add a ut for changes?

volcano-sh-bot commented 2 weeks ago

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: To complete the pull request process, please assign lowang-bh You can assign the PR to them by writing /assign @lowang-bh in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files: - **[OWNERS](https://github.com/volcano-sh/volcano/blob/master/OWNERS)** Approvers can indicate their approval by writing `/approve` in a comment Approvers can cancel approval by writing `/approve cancel` in a comment
bood commented 2 weeks ago

Thanks for your contributions. How about add a ut for changes?

one e2e test is added

bood commented 2 weeks ago

Is there any problem with CI? schedulingbase, schedulingaction, jobp all run ok on my laptop.

hwdef commented 1 week ago

Should we add a Failed Status for podgroup?

bood commented 1 week ago

Should we add a Failed Status for podgroup?

For now, I don't see any logic in volcano that cares about how the job completes. So perhaps we can just keep it simple.

However, I'm not sure whether the state should revert to PodGroupPending when it's already in PodGroupCompleted, say the pods are deleted after job failure. I saw similar case in issue #2208

        } else if jobInfo.PodGroup.Status.Phase != scheduling.PodGroupInqueue {
            status.Phase = scheduling.PodGroupPending
        }