intelligent-machine-learning / dlrover

DLRover: An Automatic Distributed Deep Learning System
Other
1.27k stars 167 forks source link

【WIP】Temp solution for socket conflict. #1227

Closed BalaBalaYi closed 3 months ago

BalaBalaYi commented 3 months ago

What changes were proposed in this pull request?

This is a temp solution: Add a random number to the socket name to avoid socket conflict.

Why are the changes needed?

The socket may not be released completely when training process restarted('finally' is not enough).

Does this PR introduce any user-facing change?

No.

How was this patch tested?

UT.

codecov[bot] commented 3 months ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 80.46%. Comparing base (1794a01) to head (77541d7). Report is 56 commits behind head on master.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## master #1227 +/- ## ========================================== + Coverage 80.43% 80.46% +0.02% ========================================== Files 217 217 Lines 19573 19576 +3 ========================================== + Hits 15744 15752 +8 + Misses 3829 3824 -5 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.