What is the time taken to converge for the hidden unit tokenizer?

I am currently training the hidden unit tokenizer to predict speech units from text token ids. Although the accuracy of the model continuously increases, I am unable to judge whether it will finally converge. I am currently at 31.2% accuracy after 3 days of training. Since this is essentially a FastSpeech model, I was expecting this to converge much faster. Please do let us know your training times, loss curves etc. Any information will be helpful!

microsoft / SpeechT5

What is the time taken to converge for the hidden unit tokenizer? #72