Unified the code path of checkBackoffLimitAndUpdateStatusIfNeeded to avoid forgetting to call the function when we have a new case to fail the RayJob in the future.
Related issue number
Checks
[ ] I've made sure the tests are passing.
Testing Strategy
[ ] Unit tests
[ ] Manual tests
[ ] This PR is not tested :(
I manually tested the case that RayJob exceeded ActiveDeadlineSeconds.
Create a RayJob with this YAML. The RayJob sets activeDeadlineSeconds to 1 second and the backoffLimit to 2. The expected behavior is that the RayJob should not retry although the backoffLimit is not 0 because it fails due to timeout.
{"level":"info","ts":"2024-07-02T17:02:22.504Z","logger":"controllers.RayJob","msg":"RayJob is not eligible for retry due to failure with DeadlineExceeded","RayJob":{"name":"rayjob-sample","namespace":"default"},"reconcileID":"6d29d711-507a-4f21-b8de-d5a45bf394cf","backoffLimit":2,"succeeded":0,"failed":1}
Why are these changes needed?
Unified the code path of
checkBackoffLimitAndUpdateStatusIfNeeded
to avoid forgetting to call the function when we have a new case to fail the RayJob in the future.Related issue number
Checks
I manually tested the case that RayJob exceeded
ActiveDeadlineSeconds
.activeDeadlineSeconds
to 1 second and thebackoffLimit
to 2. The expected behavior is that the RayJob should not retry although thebackoffLimit
is not 0 because it fails due to timeout.