Implement weight standardization option for convolution layers

System information.

TensorFlow version (you are using): 2.8.0 Are you willing to contribute it (Yes/No) : Yes

Describe the feature and the current behavior/state.

Implement an option for weight standardization for convolution layers according to "Micro-Batch Training with Batch-Channel Normalization and Weight Standardization" by Qiao, Wang, Liu, Shen, and Yuille. Ideally also supporting a learnable gain as indicated on the bottom of page 18 of "High-Performance Large-Scale Image Recognition Without Normalization" by Brock, De, Smith, and Simonyan.

This should be useful generally, but, in particular, would also be required for a potential implementation of NFNets as requested in #15229.

Will this change the current api? How?

Apart from providing more arguments for convolution layers, no.

Who will benefit from this feature?

Anyone experimenting with approaches to image classification and anyone who wants to implement nf-nets.

Contributing

I would be willing to write this and contribute a PR if there is interest, but it would be my first and so some time might have to be spent in review. I would propose to add this functionality to the private Conv base class and then expose this through classes like Conv2D etc. This would add three arguments to the constructor: weight_standardization (boolean), eps (small number to avoid division by zero when dividing by square root of variance), and use_gain (boolean to indicate whether trainable gain variables should be used). But I am open to the suggestion of alternative approaches.

keras-team / keras

Implement weight standardization option for convolution layers #16105