jaywalnut310 / vits

VITS: Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech
https://jaywalnut310.github.io/vits-demo/index.html
MIT License
6.48k stars 1.21k forks source link

Question about the TextEncoder #168

Open JohnHerry opened 11 months ago

JohnHerry commented 11 months ago

I am reading the code about TextEncoder, in the models.py line: 168

··· x = self.emb(x) * math.sqrt(self.hidden_channels) # [b, t, h] ···

My question is why there is a factor of "math.sqrt(self.hidden_channels)" there? is it a some normalization action? what is the benifit?