Closed adizhol closed 2 years ago
Hello, It seems that DistOptimizerHook is averaging the gradients across all processes\gpus, but shouldn't torch's DDP handle this?
Thanks. Adi
Since our work is based on an early version of MMCV, in which time, DDP is rarely used.
Hello, It seems that DistOptimizerHook is averaging the gradients across all processes\gpus, but shouldn't torch's DDP handle this?
Thanks. Adi