Closed CODEJIN closed 3 years ago
Hi, thanks for your interest and sorry for the late reply. I would suggest to tune the \alpha_bkw parameter in the https://github.com/amirgholami/powernorm/blob/2f23ae75c4f29904175bfd2c6b8248399ff99440/fairseq/modules/norms/mask_powernorm.py#L103. The larger it is, the smaller variance it will introduce to the later training phase.
Hi, thank you for your reply. However, the link you sent is not work for me. I saw Page not found
...
Hi,
Thank you for your code! And, I have a question. I am trying to apply power normalization(PN) to Tacotron2. However, after I changed batch norm(BN) to PN, an overflow occurred after several thousands step training. When I checked, the
MaskPowerNorm
class'sema_gz
parameter was smaller and smaller while training, and finally it became NaN. Is there any opinion or solution?Thanks,
Heejo