Ask for relation between BN and SyncBN.

zhangliliang commented 7 years ago

Hi, thanks for sharing such good codes.

I am curious about the relation between the two versions of batch normalization layers in your implementation.

I consider that the BN version is similar to the BatchNorm+Scale layer in the original caffe branch, but it is more compact that combining these two layers into one single layer. In the multi-gpu scenario, both of them are moving the averages of mean and variance in separated threads without synchronization. In the test time, we choose the mean and variance accumulated by the first gpu for inference.

On the other hand, the SyncBN would calculate and synchronize the mean and variance across all the gpus via the MPI all-reduce. The communication overhead is much larger than the BN, but it could provide a much larger batch size for accurate evaluation for the mean and variance. It might beneficial to some specific task which could only set a small batch size on a single gpu.

Am I right?

yjxiong commented 7 years ago

Yes, exactly.

zhangliliang commented 7 years ago

Thanks~

yjxiong / caffe

Ask for relation between BN and SyncBN. #178