Is your feature request related to a problem? Please describe.
Current RemoveFailedPods strategy includes a parameter reason from a terminated container's status (state). In addition to reason, the field exitCode in a container's status, which describes the exit status from the last termination of a container, can provide additional and important information about a container's termination.
A common use case is AI/ML training jobs often inject/run pre-flight health checks in initContainers and take actions according to the exitCode value when an initContainer fails, e.g., deleting the scheduled job pod via Descheduler.
Describe the solution you'd like
I'd like to propose adding a terminated container's exitCode as an additional parameter to the RemoveFailedPods strategy. The implementation should be straightforward by checking status.containerStatuses.state.terminated.exitCode. If it makes sense, I will submit an implementation.
Describe alternatives you've considered
What version of descheduler are you using?
descheduler version: the development version in the main branch
Is your feature request related to a problem? Please describe.
Current
RemoveFailedPods
strategy includes a parameter reason from a terminated container's status (state). In addition to reason, the fieldexitCode
in a container's status, which describes the exit status from the last termination of a container, can provide additional and important information about a container's termination.A common use case is AI/ML training jobs often inject/run pre-flight health checks in initContainers and take actions according to the exitCode value when an initContainer fails, e.g., deleting the scheduled job pod via
Descheduler
.Describe the solution you'd like
I'd like to propose adding a terminated container's exitCode as an additional parameter to the
RemoveFailedPods
strategy. The implementation should be straightforward by checkingstatus.containerStatuses.state.terminated.exitCode
. If it makes sense, I will submit an implementation.Describe alternatives you've considered
What version of descheduler are you using?
descheduler version: the development version in the main branch
Additional context