microsoft / SpeechT5

Unified-Modal Speech-Text Pre-Training for Spoken Language Processing
MIT License
1.09k stars 113 forks source link

What is the time taken to converge for the hidden unit tokenizer? #72

Open Kodhandarama opened 4 months ago

Kodhandarama commented 4 months ago

I am currently training the hidden unit tokenizer to predict speech units from text token ids. Although the accuracy of the model continuously increases, I am unable to judge whether it will finally converge. I am currently at 31.2% accuracy after 3 days of training. Since this is essentially a FastSpeech model, I was expecting this to converge much faster. Please do let us know your training times, loss curves etc. Any information will be helpful!