intelligent-machine-learning / dlrover

DLRover: An Automatic Distributed Deep Learning System
Other
1.27k stars 167 forks source link

Fix relaunch node's relaunch limit. #1296

Closed BalaBalaYi closed 1 month ago

BalaBalaYi commented 1 month ago

What changes were proposed in this pull request?

Relaunch node will inherit the relaunch limit value.

Why are the changes needed?

Worker number less 3 should be able to relaunch 3 times.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

UT and training.

codecov[bot] commented 1 month ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 80.37%. Comparing base (cc8c8f0) to head (87d3244). Report is 3 commits behind head on master.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## master #1296 +/- ## ========================================== - Coverage 80.38% 80.37% -0.02% ========================================== Files 222 222 Lines 20505 20507 +2 ========================================== - Hits 16483 16482 -1 - Misses 4022 4025 +3 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.