tuanh123789 / AdaSpeech

An implementation of Microsoft's "AdaSpeech: Adaptive Text to Speech for Custom Voice"
96 stars 27 forks source link

Need Help with source Model training #16

Open offside609 opened 1 year ago

offside609 commented 1 year ago

Hi Folks,

I am at the 1st step of adaspeech training as per paper. Source Model Training. I used Libritts dataset, but reduced it to half to expedite the experiment. It has 1140 speakers for training. There was little mismatch in preprocessing parameters in adaspeech paper and default values provided in code. We went with the value of the code. We trained the model for 300k steps on colab. I am providing screenshot of my loss profile from tensor-board.

Screenshot 2023-07-27 at 4 08 57 AM

WhatsApp Image 2023-07-27 at 04 10 09

Please don't mind multiple color in graphs. While training on colab I had to restore training multiple times, leading to separate log files. But more fluctuating one is Train loss while the smoother line is validation loss. I also attaching output I took from inference.py with speaker ID 107 on an out of the sample test sentence at 160k, 170k and 210k steps. Since I cannot attach .wav/.mp3 here, or may be I don't know how to do that, I am attaching drive link where they are hosted. Reference audio for 107 will give you an idea, how does speaker sound like. https://drive.google.com/drive/folders/19Og2t4h2quygmrJ87xEMPoTQ7yTz9Q_e?usp=sharing

My output is little metallic and grainy, has little reverberations and pitch needs to improve. I want to understand on what all dimensions it need to improve? Also, what can i do better in training to do that?