Closed sguo35 closed 4 years ago
@sguo35 Is there any reason why we don't use amp.initialize
for the model with NCCL?
@sguo35 Is there any reason why we don't use
amp.initialize
for the model with NCCL?
Amp has issues and not really any strong support for manual partial backward passes like we are doing. I had issues getting it to work but maybe I’m missing something.
Adds manually implemented mixed precision for NCCL models, some larger models may have gradient error as high as 1e-4 due to inaccuracy in reductions (most reductions are not performed in FP32 yet). Speedup is up to 3-4x on large models that can fully saturate the GPU. The NCCL code follows the method outlined at https://on-demand.gputechconf.com/gtc-taiwan/2018/pdf/5-1_Internal%20Speaker_Michael%20Carilli_PDF%20For%20Sharing.pdf
I only tested with p3.8xlarge so hopefully it works for multi-node.