jaywalnut310 / vits

VITS: Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech
https://jaywalnut310.github.io/vits-demo/index.html
MIT License
6.48k stars 1.21k forks source link

Silence when generating audio #132

Closed LanglyAdrian closed 1 year ago

LanglyAdrian commented 1 year ago

I downloaded the VCTK-corpus dataset, downsampled to 22050Hz and started training with default parameters (I only changed the batch size to 32). 130k has already passed, I wanted to hear the result, but when generating sound (G_1300000.pth) I get silence. This is fine? Has anyone experienced this?

LanglyAdrian commented 1 year ago

The error was that I tried to downsample the wavs via the librosa.load('test.wav', sr=22050) method, which resulted in the bitrate changing from 16 to 32. The correct downsample method solved the problem.

jmaxzh commented 1 year ago

The error was that I tried to downsample the wavs via the librosa.load('test.wav', sr=22050) method, which resulted in the bitrate changing from 16 to 32. The correct downsample method solved the problem.

Can you be more specific? I also encountered the same problem

CONGLUONG12 commented 1 year ago

@LanglyAdrian Can you tell me the method you fixed the problem?

LanglyAdrian commented 1 year ago

ffmpeg or sox

ср, 5 июл. 2023 г., 17:25 CongLuong12 @.***>:

@LanglyAdrian https://github.com/LanglyAdrian Can you tell me the method you fixed the problem?

— Reply to this email directly, view it on GitHub https://github.com/jaywalnut310/vits/issues/132#issuecomment-1621870999, or unsubscribe https://github.com/notifications/unsubscribe-auth/A3PCL66PVZVIKGW5VFZFJK3XOV2N5ANCNFSM6AAAAAAVILGC74 . You are receiving this because you were mentioned.Message ID: @.***>

CONGLUONG12 commented 1 year ago

@LanglyAdrian Thank you. You are correct, the problem is downsample step. I have downsampled with torchaudio, which defaults to normalization. Therefore, the wav file is normalized 2 times. Then I used pydub library to downsample and it worked fine.