microsoft / DeepSpeed

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
https://www.deepspeed.ai/
Apache License 2.0
33.63k stars 3.95k forks source link

stage_1_and_2: optimize clip calculation to use clamp #5632

Closed nelyahu closed 3 weeks ago

nelyahu commented 3 weeks ago

instead of "if" that causes host/device synchronization and introduces a bubble, while clamp is hapenning on the device