intelligent-machine-learning / dlrover

DLRover: An Automatic Distributed Deep Learning System
Other
1.27k stars 167 forks source link

[WIP] Refactor diagnosis manager #1302

Closed samplise closed 1 week ago

samplise commented 1 month ago

What changes were proposed in this pull request?

Refactor diagnosis manager.

Why are the changes needed?

Complete the diagnose manager implmenetation.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Unit test

codecov[bot] commented 2 weeks ago

Codecov Report

Attention: Patch coverage is 87.41935% with 117 lines in your changes missing coverage. Please review.

Project coverage is 80.57%. Comparing base (186783e) to head (43dce77).

Files with missing lines Patch % Lines
...lrover/python/diagnosis/common/diagnosis_action.py 53.24% 36 Missing :warning:
dlrover/python/elastic_agent/torch/training.py 46.87% 17 Missing :warning:
dlrover/python/master/node/dist_job_manager.py 79.72% 15 Missing :warning:
dlrover/python/master/node/worker.py 63.15% 14 Missing :warning:
dlrover/python/master/node/training_node.py 81.57% 7 Missing :warning:
...rover/python/master/diagnosis/diagnosis_manager.py 85.18% 4 Missing :warning:
dlrover/python/master/node/local_job_manager.py 84.61% 4 Missing :warning:
dlrover/python/master/node/ps.py 84.61% 4 Missing :warning:
...rover/python/elastic_agent/config/launch_config.py 93.87% 3 Missing :warning:
.../python/elastic_agent/diagnosis/diagnosis_agent.py 92.30% 3 Missing :warning:
... and 5 more
Additional details and impacted files ```diff @@ Coverage Diff @@ ## master #1302 +/- ## ========================================== + Coverage 80.52% 80.57% +0.05% ========================================== Files 222 226 +4 Lines 20707 21266 +559 ========================================== + Hits 16674 17136 +462 - Misses 4033 4130 +97 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.