why grad norm is so high？

jishengpeng / WavTokenizer

SOTA discrete acoustic codec models with 40 tokens per second for audio language modeling

MIT License

801 stars 44 forks source link

why grad norm is so high？ #38

Open necrophagists opened 2 months ago

necrophagists commented 2 months ago

s 微信截图_20240923214125

jishengpeng commented 2 months ago

s

We did not place significant emphasis on this aspect in WavTokenizer; we plan to implement certain engineering optimizations related to the loss function in WavTokenizer2. Thank you for pointing this.