intelligent-machine-learning / dlrover

DLRover: An Automatic Distributed Deep Learning System
Other
1.27k stars 167 forks source link

Add try except for getting dead node. #1198

Closed BalaBalaYi closed 4 months ago

BalaBalaYi commented 4 months ago

What changes were proposed in this pull request?

  1. Add try-except for unexpected error when getting dead nodes.
  2. Use timestamp compare instead for datetime may has zone issue.

Why are the changes needed?

Enhancement for getting dead node event.

Does this PR introduce any user-facing change?

Nope.

How was this patch tested?

UT.