jaywalnut310 / vits

VITS: Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech
https://jaywalnut310.github.io/vits-demo/index.html
MIT License
6.72k stars 1.23k forks source link

Purpose of sum(-1) in sqrt of spectrogram calculation #154

Open m1rakram opened 1 year ago

m1rakram commented 1 year ago

Hi, first of thank you for this repo which helps in a great way. I have question:

spec = torch.sqrt(spec.pow(2).sum(-1) + 1e-6)

This line of code is present in spectrogram calculation in mel_processing.py. The problem is that we lose the information about frame_length when we use sum(-1). That is why, output of spectrogram function becomes [8, 513] (8 is batch size). Why did you use sum(-1)?

Thanks in advance