microsoft / DeepSpeedExamples

Example models using DeepSpeed
Apache License 2.0
6.02k stars 1.02k forks source link

Enable overlap_comm for better performance #846

Closed li-plus closed 2 weeks ago

li-plus commented 8 months ago

Enable overlap of backward computation and gradient all-reduce. This produces 1.05x end-to-end speedup in SFT training with my settings. See also https://github.com/microsoft/DeepSpeed/pull/4887.