Open Guitaricet opened 3 years ago
Sure, I will add the clarifications later, thanks for the suggestions.
Hi! Thank you for releasing the paper code!
I had some issues understanding the implementation that are solved by now. However, I expect that many of the people who decide to use PowerNorm in their projects will face them too. Fixing them would increase the impact of this research and make lives of some people just a bit easier.
What I propose to do:
- [ ] Indicate the location of the PowerNorm code in the readme
- [ ] Improve docstrings
Fairseq is a pretty big repository and finding a module you are looking for is like fining a needle in a haystack. Explicitly showing the the module is placed right in the readme is an easy solution. Another solution, even a better one, would be to create a new project that would only contain PowerNorm implementation and the corresponding tests.
Currently, the docstring for
MaskPowerNorm
""" An implementation of masked batch normalization, used for testing the numerical stability. """
does not indicate that this is exactly the PowerNorm described in the paper. It confuses, because it makes an impression that this module is only used for testing. After reading the docstring I spent some extra time searching for a different implementation and verifying that this one is exactly the one used in the experiments.
Initialization parameters are not documented at all and while some of them --
num_features
,affine
, ... behave exactly like innn.BatchNorm1d
, the others are specific to PowerNorm (alpha_fwd
,alpha_bwd
,warmup_iters
, ...). It is not clear what they do without going back to the paper and reading the source code.
PowerFunction
is not documented at all šCould you please add these clarifications to the docstirings? It should not take more than an hour, and it will definitely save time for many people wanting to use PowerNorm in their projects.
+1 great remarks - it is impossible to understand what is going on
Did you happen to understand if MaskPowerNorm
is indeed the power norm implementation?
Did you happen to understand if MaskPowerNorm is indeed the power norm implementation?
If you navigate to fairseq/modules/norm_select.py
, inside the function NormSelect
-> MaskPowerNorm is indeed used.
Hi! Thank you for releasing the paper code!
I had some issues understanding the implementation that are solved by now. However, I expect that many of the people who decide to use PowerNorm in their projects will face them too. Fixing them would increase the impact of this research and make lives of some people just a bit easier.
What I propose to do:
Fairseq is a pretty big repository and finding a module you are looking for is like fining a needle in a haystack. Explicitly showing the the module is placed right in the readme is an easy solution. Another solution, even a better one, would be to create a new project that would only contain PowerNorm implementation and the corresponding tests.
Currently, the docstring for
MaskPowerNorm
does not indicate that this is exactly the PowerNorm described in the paper. It confuses, because it makes an impression that this module is only used for testing. After reading the docstring I spent some extra time searching for a different implementation and verifying that this one is exactly the one used in the experiments.
Initialization parameters are not documented at all and while some of them --
num_features
,affine
, ... behave exactly like innn.BatchNorm1d
, the others are specific to PowerNorm (alpha_fwd
,alpha_bwd
,warmup_iters
, ...). It is not clear what they do without going back to the paper and reading the source code.PowerFunction
is not documented at all šCould you please add these clarifications to the docstirings? It should not take more than an hour, and it will definitely save time for many people wanting to use PowerNorm in their projects.