Closed jingyanwangms closed 3 years ago
We found turning contiguous_gradients flag to false improves performance for all models. Deepspeed folks has confirmed this flag is important for large models but should not make a difference for models that fits base pytorch.
We found turning contiguous_gradients flag to false improves performance for all models. Deepspeed folks has confirmed this flag is important for large models but should not make a difference for models that fits base pytorch.