intelligent-machine-learning / dlrover

DLRover: An Automatic Distributed Deep Learning System
Other
1.22k stars 153 forks source link

Fix duplicate pod relaunching for some cases(with internal k8s). #1259

Closed BalaBalaYi closed 3 weeks ago

BalaBalaYi commented 3 weeks ago

What changes were proposed in this pull request?

Do judgement in 'process_event' instead of in 'k8s watcher'. Because there are 2 ways to trigger node event processing. Judgement in 'k8s watcher' is not enough.

Why are the changes needed?

To fix duplicate pod relaunching for some cases(TKP).

Does this PR introduce any user-facing change?

No.

How was this patch tested?

UT.

codecov[bot] commented 3 weeks ago

Codecov Report

Attention: Patch coverage is 97.22222% with 1 line in your changes missing coverage. Please review.

Project coverage is 80.55%. Comparing base (6ba7230) to head (8823d8a). Report is 1 commits behind head on master.

Files with missing lines Patch % Lines
dlrover/python/master/watcher/k8s_watcher.py 50.00% 1 Missing :warning:
Additional details and impacted files ```diff @@ Coverage Diff @@ ## master #1259 +/- ## ========================================== - Coverage 80.61% 80.55% -0.06% ========================================== Files 218 218 Lines 19900 19909 +9 ========================================== - Hits 16042 16038 -4 - Misses 3858 3871 +13 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.