intelligent-machine-learning / dlrover

DLRover: An Automatic Distributed Deep Learning System
Other
1.27k stars 167 forks source link

Optimize logging and revert using random to create socket #1246

Closed BalaBalaYi closed 2 months ago

BalaBalaYi commented 2 months ago

What changes were proposed in this pull request?

  1. Revert using random to create socket(inappropriate).
  2. Key error catch and logging.

Why are the changes needed?

To enforce the implementation of flash checkpoint.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

UT

codecov[bot] commented 2 months ago

Codecov Report

Attention: Patch coverage is 85.71429% with 2 lines in your changes missing coverage. Please review.

Project coverage is 80.38%. Comparing base (1ad45be) to head (53c1b39). Report is 16 commits behind head on master.

Files Patch % Lines
dlrover/python/common/multi_process.py 83.33% 2 Missing :warning:
Additional details and impacted files ```diff @@ Coverage Diff @@ ## master #1246 +/- ## ========================================== - Coverage 80.46% 80.38% -0.08% ========================================== Files 214 218 +4 Lines 19648 19755 +107 ========================================== + Hits 15810 15881 +71 - Misses 3838 3874 +36 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.