Open chengmengli06 opened 1 week ago
@chengmengli06, this is incorrect reading of the code. Gradient clipping is handled in the respective optimizer implementations such as:
I find it, and verify that it does work under zero_2 mode. Thanks!
@tjruwase another question is how log the pre-clip and after clip gradient norms to tensorboard? is there any interface to get the pre and after clip gradient norms?
set "gradient_clipping" in deepspeed does not work, look into the source code in deepspeed.runtime.engine.DeepSpeedEngine,in line 2101
thus gradient clipping do nothing at all!!!