keithito / tacotron

A TensorFlow implementation of Google's Tacotron speech synthesis with pre-trained model (unofficial)
MIT License
2.94k stars 965 forks source link

trained model in the test time dont say end of the sentence. #245

Open NimaSedghiye opened 5 years ago

NimaSedghiye commented 5 years ago

hi. i train a model on persian dataset but when i want to test the model i found some sentences become beep in output. some cases end of the sentence pronounced Weak or not pronounced.

ghost commented 5 years ago

How big is your dataset and how many steps did you train for?

I had this happen as well on an English dataset. It mostly happens when a sentence contains two or more words that are not in the dataset or if I tried typing in two or more sentences at one time. One method I used that decreased the likelihood of it happening was by copying the existing audio files and slightly changing the speed and/or pitch to increase the dataset. That means the dataset will have duplicate data with some small changes to the speed and pitch. Since it's duplicated data, you can copy the existing transcriptions. If your dataset is small, you can use that as a quick way to make the dataset bigger. Of course a dataset with entirely unique audio will probably yield better results than one using duplicated audio.

Once the dataset is bigger, you should notice the loss value decreases slower and the output gets better after more training. It's possible though that this project won't work as well with Persian as it does with English. Some people mentioned needing to modify the project to get better results for Chinese.

acrosson commented 5 years ago

@NimaSedghiye i've got similar results. What solved this for you? Was it as simple as increasing the training dataset? or training for longer?

mohsen-goodarzi commented 4 years ago

I'm having exactly the same problem (not pronouncing the final word of sentences) in Persian. I've used a 14h dataset.