keras-team / keras

Deep Learning for humans
http://keras.io/
Apache License 2.0
61.98k stars 19.46k forks source link

Implement weight standardization option for convolution layers #16105

Closed englert-m closed 2 years ago

englert-m commented 2 years ago

System information.

TensorFlow version (you are using): 2.8.0 Are you willing to contribute it (Yes/No) : Yes

Describe the feature and the current behavior/state.

Implement an option for weight standardization for convolution layers according to "Micro-Batch Training with Batch-Channel Normalization and Weight Standardization" by Qiao, Wang, Liu, Shen, and Yuille. Ideally also supporting a learnable gain as indicated on the bottom of page 18 of "High-Performance Large-Scale Image Recognition Without Normalization" by Brock, De, Smith, and Simonyan.

This should be useful generally, but, in particular, would also be required for a potential implementation of NFNets as requested in #15229.

Will this change the current api? How?

Apart from providing more arguments for convolution layers, no.

Who will benefit from this feature?

Anyone experimenting with approaches to image classification and anyone who wants to implement nf-nets.

Contributing

I would be willing to write this and contribute a PR if there is interest, but it would be my first and so some time might have to be spent in review. I would propose to add this functionality to the private Conv base class and then expose this through classes like Conv2D etc. This would add three arguments to the constructor: weight_standardization (boolean), eps (small number to avoid division by zero when dividing by square root of variance), and use_gain (boolean to indicate whether trainable gain variables should be used). But I am open to the suggestion of alternative approaches.

fchollet commented 2 years ago

Question: does this involve project after every batch (which could be implement via a regular Conv2D layer and a weight constraint) or does this use a mechanism similar to batchnorm? If the latter, you can simply write your own layer to do it.

Note that we will not add it to the API -- you can simply maintain the layer as part of your own code.