A fork of Caffe with OpenMPI-based Multi-GPU (mainly data parallel) support for action recognition and more. More documentation please see the original readme.
I am curious about the relation between the two versions of batch normalization layers in your implementation.
I consider that the BN version is similar to the BatchNorm+Scale layer in the original caffe branch, but it is more compact that combining these two layers into one single layer. In the multi-gpu scenario, both of them are moving the averages of mean and variance in separated threads without synchronization. In the test time, we choose the mean and variance accumulated by the first gpu for inference.
On the other hand, the SyncBN would calculate and synchronize the mean and variance across all the gpus via the MPI all-reduce. The communication overhead is much larger than the BN, but it could provide a much larger batch size for accurate evaluation for the mean and variance. It might beneficial to some specific task which could only set a small batch size on a single gpu.
Hi, thanks for sharing such good codes.
I am curious about the relation between the two versions of batch normalization layers in your implementation.
I consider that the BN version is similar to the BatchNorm+Scale layer in the original caffe branch, but it is more compact that combining these two layers into one single layer. In the multi-gpu scenario, both of them are moving the averages of mean and variance in separated threads without synchronization. In the test time, we choose the mean and variance accumulated by the first gpu for inference.
On the other hand, the SyncBN would calculate and synchronize the mean and variance across all the gpus via the MPI all-reduce. The communication overhead is much larger than the BN, but it could provide a much larger batch size for accurate evaluation for the mean and variance. It might beneficial to some specific task which could only set a small batch size on a single gpu.
Am I right?