DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
33.63k
stars
3.95k
forks
source link
stage_1_and_2: optimize clip calculation to use clamp #5632
Closed
nelyahu closed 3 weeks ago
instead of "if" that causes host/device synchronization and introduces a bubble, while clamp is hapenning on the device