sIncerass / powernorm

[ICML 2020] code for "PowerNorm: Rethinking Batch Normalization in Transformers" https://arxiv.org/abs/2003.07845
GNU General Public License v3.0
119 stars 17 forks source link

Feature request: improved documentation #6

Open Guitaricet opened 3 years ago

Guitaricet commented 3 years ago

Hi! Thank you for releasing the paper code!

I had some issues understanding the implementation that are solved by now. However, I expect that many of the people who decide to use PowerNorm in their projects will face them too. Fixing them would increase the impact of this research and make lives of some people just a bit easier.

What I propose to do:

Fairseq is a pretty big repository and finding a module you are looking for is like fining a needle in a haystack. Explicitly showing the the module is placed right in the readme is an easy solution. Another solution, even a better one, would be to create a new project that would only contain PowerNorm implementation and the corresponding tests.

Currently, the docstring for MaskPowerNorm

    """
    An implementation of masked batch normalization, used for testing the numerical
    stability.
    """

does not indicate that this is exactly the PowerNorm described in the paper. It confuses, because it makes an impression that this module is only used for testing. After reading the docstring I spent some extra time searching for a different implementation and verifying that this one is exactly the one used in the experiments.

Initialization parameters are not documented at all and while some of them -- num_features, affine, ... behave exactly like in nn.BatchNorm1d, the others are specific to PowerNorm (alpha_fwd, alpha_bwd, warmup_iters, ...). It is not clear what they do without going back to the paper and reading the source code.

PowerFunction is not documented at all šŸ˜ž

Could you please add these clarifications to the docstirings? It should not take more than an hour, and it will definitely save time for many people wanting to use PowerNorm in their projects.

sIncerass commented 3 years ago

Sure, I will add the clarifications later, thanks for the suggestions.

sholevs66 commented 2 years ago

Hi! Thank you for releasing the paper code!

I had some issues understanding the implementation that are solved by now. However, I expect that many of the people who decide to use PowerNorm in their projects will face them too. Fixing them would increase the impact of this research and make lives of some people just a bit easier.

What I propose to do:

  • [ ] Indicate the location of the PowerNorm code in the readme
  • [ ] Improve docstrings

Fairseq is a pretty big repository and finding a module you are looking for is like fining a needle in a haystack. Explicitly showing the the module is placed right in the readme is an easy solution. Another solution, even a better one, would be to create a new project that would only contain PowerNorm implementation and the corresponding tests.

Currently, the docstring for MaskPowerNorm

    """
    An implementation of masked batch normalization, used for testing the numerical
    stability.
    """

does not indicate that this is exactly the PowerNorm described in the paper. It confuses, because it makes an impression that this module is only used for testing. After reading the docstring I spent some extra time searching for a different implementation and verifying that this one is exactly the one used in the experiments.

Initialization parameters are not documented at all and while some of them -- num_features, affine, ... behave exactly like in nn.BatchNorm1d, the others are specific to PowerNorm (alpha_fwd, alpha_bwd, warmup_iters, ...). It is not clear what they do without going back to the paper and reading the source code.

PowerFunction is not documented at all šŸ˜ž

Could you please add these clarifications to the docstirings? It should not take more than an hour, and it will definitely save time for many people wanting to use PowerNorm in their projects.

+1 great remarks - it is impossible to understand what is going on Did you happen to understand if MaskPowerNorm is indeed the power norm implementation?

lumliolum commented 3 months ago

Did you happen to understand if MaskPowerNorm is indeed the power norm implementation?

If you navigate to fairseq/modules/norm_select.py, inside the function NormSelect -> MaskPowerNorm is indeed used.

https://github.com/sIncerass/powernorm/blob/9ea6226a3203d5d6fcee07a5c6dec38ec6bc5e9f/fairseq/modules/norm_select.py#L12-L19