[Mask R-CNN] - Clarify whether group-normalization can be used in place of batch-normalization for resnet backbone

mlcommons / training_policies

Issues related to MLPerf™ training policies, including rules and suggested changes

https://mlcommons.org/en/groups/training

Apache License 2.0

92 stars 66 forks source link

[Mask R-CNN] - Clarify whether group-normalization can be used in place of batch-normalization for resnet backbone #460

Closed qpjaada closed 3 years ago

qpjaada commented 3 years ago

We would like to know if its possible to use group-normalization instead of batch-normalization for the resnet backbone of Mask R-CNN.

The motivating reason is that for hardware designed to be more efficient with small batch-sizes, batch-normalization is not a natural choice. Hence, allowing other types of normalization would be really appreciated.

nv-rborkar commented 3 years ago

Hi George, As I noted in the SWG chat today, MaskRCNN uses a frozen-norm i.e just a channel-wise multiply-by-a-constant and an add-a-constant at the end of each convolution. So the computation is independent of batch size and shouldn't get impacted any differently for small batch-sizes.

Confirming that just the BN layer is frozen and the remaining layers of the resnet backbone are trained . (Sorry saw your question on WG chat after you had already dropped off)

qpjaada commented 3 years ago

Thanks a lot for this Ritika. I was not quite aware of this detail. Do you happen to have a pointer to the reference code where the constants for channel-wise multiply and add are specified/used?

Best -George

nv-rborkar commented 3 years ago

running_mean and running_var above are defined as register_buffer here : https://github.com/mlcommons/training/blob/master/object_detection/pytorch/maskrcnn_benchmark/layers/batch_norm.py#L19

When a param is defined as register_buffer the optimizer does not update its value, hence the param is not trained.

Some explanation from Pytorch forums can be found here: https://discuss.pytorch.org/t/what-is-the-difference-between-register-buffer-and-register-parameter-of-nn-module/32723/2

running_mean & running_var are loaded from the pre-trained backbone when the checkpoint (& model's state dictionary) is loaded.

qpjaada commented 3 years ago

Thanks a lot for this info @nv-rborkar. Very useful!

qpjaada commented 3 years ago

Closing this issue because as Ritika points out, the BN layer for the resnet backbone is frozen and hence this is really a non-issue.