sIncerass / powernorm

[ICML 2020] code for "PowerNorm: Rethinking Batch Normalization in Transformers" https://arxiv.org/abs/2003.07845
GNU General Public License v3.0
119 stars 17 forks source link

The broken affine parameter and the redundancy codes #4

Closed grassking100 closed 3 years ago

grassking100 commented 3 years ago

Hello: It seems the affine parameter of MaskPowerNorm in line 109 of mask_powernorm.py is not working properly. The weight and bias would be used no matter the boolean value of the affine. The codes in line 86 and line 88 of mask_powernorm.py seem to be redundancy. And I think the codes in 142-154 of mask_powernorm.py could be changed to the following codes to reduce redundancy.

        # construct the mask_input, size to be (BxL) x C: L is the real length here
        if pad_mask is None:
            mask_input = input.clone()
        else:
            # Transpose the bn_mask (B x T -> T x B)
            bn_mask = ~pad_mask
            bn_mask = bn_mask.transpose(0, 1)
            #pad_size = (~bn_mask).sum() //is pad_size redundancy?
            mask_input = input[bn_mask, :]

https://github.com/sIncerass/powernorm/blob/bffa7dd7fcf8932a32875b64eb40df0f566ac301/fairseq/modules/norms/mask_powernorm.py#L109-L112

https://github.com/sIncerass/powernorm/blob/bffa7dd7fcf8932a32875b64eb40df0f566ac301/fairseq/modules/norms/mask_powernorm.py#L86

https://github.com/sIncerass/powernorm/blob/bffa7dd7fcf8932a32875b64eb40df0f566ac301/fairseq/modules/norms/mask_powernorm.py#L88

https://github.com/sIncerass/powernorm/blob/bffa7dd7fcf8932a32875b64eb40df0f566ac301/fairseq/modules/norms/mask_powernorm.py#L142-L154

Thanks, grassking100

sIncerass commented 3 years ago

thanks for the suggestion! yes, i will improve on the cleaned version you proposed.