Question about the TextEncoder

jaywalnut310 / vits

VITS: Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech

MIT License

6.48k stars 1.21k forks source link

Open JohnHerry opened 11 months ago

JohnHerry commented 11 months ago

I am reading the code about TextEncoder, in the models.py line: 168

··· x = self.emb(x) * math.sqrt(self.hidden_channels) # [b, t, h] ···

My question is why there is a factor of "math.sqrt(self.hidden_channels)" there? is it a some normalization action? what is the benifit?