Closed mmbejani closed 4 years ago
Hard to tell. You have to provide more information
I download the dataset from the following link: http://fa.persianspeechcorpus.com/ I use the DeepSpeech2 that you developed. The optimization algorithm is Adam with 1e-5 learning_rate. The spectrogram is extracted as the feature. The loss value decreases from 7000 to 330 and I test a training sample on the network, and got the 'aaaaaaaaaaaaaaaaa'! Also I can share my code.
Another note that the loss value does not decrease from 330. The alphabet set has 33 members. Do you think it is related to the value of loss function(33 and 330)?
The dataset is soo small (1.1 Hrs)! Is it a reason for this phenomenon? If yes, I have another question. Why is the model underfitted on the training dataset and can't predict the training samples?
Hey @mmbejani, real datasets are huge. However, if you try to transfer knowledge, please consider freezing the model (or part of). I close the issue because it's not related with the implementation itself. Good luck!
I use your code to train on a specific language. After some epoch, I test the output of the network on a sample of the training dataset. The output is a repeated constant char. For example 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'.
What is the problem?