microsoft / DeepSpeed

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
https://www.deepspeed.ai/
Apache License 2.0
34.93k stars 4.06k forks source link

Monitor was always enabled causing performance degradation #5633

Closed deepcharm closed 3 months ago

deepcharm commented 3 months ago

The Boolean expression for the monitor to be enabled was incorrect, as instead of using the enabled field, it used the comet configuration object, making the expression always True.

This caused performance degradation (we've observed ~10% drop) as it erroneously invoked the events logging flow along with the expensive calculation of loss.mean().item().

nelyahu commented 3 months ago

http connection issue caused cpu-torch-latest to fail

alexkuzmik commented 3 months ago

@deepcharm Hi! It's Alex from Comet. Thanks for noticing and fixing the bug, we appreciate it.