rolczynski / Automatic-Speech-Recognition

🎧 Automatic Speech Recognition: DeepSpeech & Seq2Seq (TensorFlow)
GNU Affero General Public License v3.0
223 stars 64 forks source link

Repeated Constant Output #15

Closed mmbejani closed 4 years ago

mmbejani commented 4 years ago

I use your code to train on a specific language. After some epoch, I test the output of the network on a sample of the training dataset. The output is a repeated constant char. For example 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'.

What is the problem?

rolczynski commented 4 years ago

Hard to tell. You have to provide more information

mmbejani commented 4 years ago

I download the dataset from the following link: http://fa.persianspeechcorpus.com/ I use the DeepSpeech2 that you developed. The optimization algorithm is Adam with 1e-5 learning_rate. The spectrogram is extracted as the feature. The loss value decreases from 7000 to 330 and I test a training sample on the network, and got the 'aaaaaaaaaaaaaaaaa'! Also I can share my code.

mmbejani commented 4 years ago

Another note that the loss value does not decrease from 330. The alphabet set has 33 members. Do you think it is related to the value of loss function(33 and 330)?

mmbejani commented 4 years ago

The code link: https://drive.google.com/file/d/1Kkb_S1pd1w6BEXckLj_7CMLcjt7JgZiI/view?usp=sharing

mmbejani commented 4 years ago

The dataset is soo small (1.1 Hrs)! Is it a reason for this phenomenon? If yes, I have another question. Why is the model underfitted on the training dataset and can't predict the training samples?

rolczynski commented 4 years ago

Hey @mmbejani, real datasets are huge. However, if you try to transfer knowledge, please consider freezing the model (or part of). I close the issue because it's not related with the implementation itself. Good luck!