intelligent-machine-learning / dlrover

DLRover: An Automatic Distributed Deep Learning System
Other
1.22k stars 153 forks source link

Optimize test for ascend NPU. #1224

Closed BalaBalaYi closed 1 month ago

BalaBalaYi commented 1 month ago

What changes were proposed in this pull request?

Post torch_npu._npu_shutdown() to avoid influence on dist.destroy_process_group().

Why are the changes needed?

Improve test impl for HUAWEI NPU.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

TODO.

codecov[bot] commented 1 month ago

Codecov Report

Attention: Patch coverage is 66.66667% with 4 lines in your changes missing coverage. Please review.

Project coverage is 80.44%. Comparing base (adc8bba) to head (aafb893). Report is 26 commits behind head on master.

Files Patch % Lines
dlrover/trainer/torch/node_check/ascend_npu.py 66.66% 4 Missing :warning:
Additional details and impacted files ```diff @@ Coverage Diff @@ ## master #1224 +/- ## ========================================== + Coverage 80.41% 80.44% +0.02% ========================================== Files 217 217 Lines 19463 19511 +48 ========================================== + Hits 15652 15695 +43 - Misses 3811 3816 +5 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.