Closed qpjaada closed 3 years ago
Hi George, As I noted in the SWG chat today, MaskRCNN uses a frozen-norm i.e just a channel-wise multiply-by-a-constant and an add-a-constant at the end of each convolution. So the computation is independent of batch size and shouldn't get impacted any differently for small batch-sizes.
Confirming that just the BN layer is frozen and the remaining layers of the resnet backbone are trained . (Sorry saw your question on WG chat after you had already dropped off)
Thanks a lot for this Ritika. I was not quite aware of this detail. Do you happen to have a pointer to the reference code where the constants for channel-wise multiply and add are specified/used?
Best -George
running_mean and running_var above are defined as register_buffer
here : https://github.com/mlcommons/training/blob/master/object_detection/pytorch/maskrcnn_benchmark/layers/batch_norm.py#L19
When a param is defined as register_buffer
the optimizer does not update its value, hence the param is not trained.
Some explanation from Pytorch forums can be found here: https://discuss.pytorch.org/t/what-is-the-difference-between-register-buffer-and-register-parameter-of-nn-module/32723/2
running_mean & running_var are loaded from the pre-trained backbone when the checkpoint (& model's state dictionary) is loaded.
Thanks a lot for this info @nv-rborkar. Very useful!
Closing this issue because as Ritika points out, the BN layer for the resnet backbone is frozen and hence this is really a non-issue.
We would like to know if its possible to use group-normalization instead of batch-normalization for the resnet backbone of Mask R-CNN.
The motivating reason is that for hardware designed to be more efficient with small batch-sizes, batch-normalization is not a natural choice. Hence, allowing other types of normalization would be really appreciated.