mozilla / DeepSpeech

DeepSpeech is an open source embedded (offline, on-device) speech-to-text engine which can run in real time on devices ranging from a Raspberry Pi 4 to high power GPU servers.
Mozilla Public License 2.0
25.35k stars 3.97k forks source link

WER shoots up when retraining the pretrained model for an additional epoch on Libri #2185

Closed bopjesvla closed 4 years ago

bopjesvla commented 5 years ago

This bug was discussed here with @lissyx: https://discourse.mozilla.org/t/wer-shoots-up-when-retraining-the-pretrained-model-for-an-additional-epoch-on-libri/41062

CHECK=~/scratch/checkpoint-`date +%s`
cp -r deepspeech-0.4.1-checkpoint/ $CHECK
echo $CHECK
for i in {1..1}
do
    python3 ../DeepSpeech/DeepSpeech.py --n_hidden 2048 --epoch -1 \
            --train_files libri/librivox-train-clean-100.csv \
            --dev_files libri/librivox-dev-clean.csv \
            --test_files libri/librivox-test-clean.csv \
            --checkpoint_dir $CHECK \
            --train_batch_size 24 \
            --dev_batch_size 24 \
            --test_batch_size 48 \
            --validation_step 1 \
            --checkpoint_step 1 \
            -–learning_rate 0.0001 \
            --dropout_rate 0.15 \
            --lm_alpha 0.75 \
            --lm_beta 1.85 \
            --export_dir $CHECK/export \
            --alphabet_config_path ~/asr/models/alphabet.txt \
            --lm_binary_path ~/asr/models/lm.binary \
            --lm_trie_path ~/asr/models/trie \
            --beam_width 1024 | tee training-$i.out
done

The relevant output:

/home/bderuiter/scratch/checkpoint-1559468775
100% (595 of 595) |######################| Elapsed Time: 0:04:32 Time:  0:04:32
100% (56 of 56) |########################| Elapsed Time: 0:00:19 Time:  0:00:19
100% (54 of 54) |########################| Elapsed Time: 0:01:55 Time:  0:01:55
100% (54 of 54) |########################| Elapsed Time: 0:05:11 Time:  0:05:11
Preprocessing ['libri/librivox-train-clean-100.csv']
Preprocessing done
Preprocessing ['libri/librivox-dev-clean.csv']
Preprocessing done
I STARTING Optimization
I Training epoch 378...
I Training of Epoch 378 - loss: 150.271789
I Validating epoch 378...
I Validation of Epoch 378 - loss: 108.798860
I FINISHED Optimization - training time: 0:04:52
Preprocessing ['libri/librivox-test-clean.csv']
Preprocessing done
Computing acoustic model predictions...
Decoding predictions...
Test - WER: 0.699878, CER: 48.738426, loss: 148.851822

If epoch is set to 0, the WER is 0.08, which is about expected. Why would the WER shoot up to 0.7 when training for one more epoch?

I understand that the pretrained model was trained on more than just LibriSpeech, but the difference is incredibly large. The reason I’m asking is that I’m seeing similar increases in WER when I continue to train the model on another dataset. The WER of the non-finetuned pretrained model on this dataset is 0.11, but when I train the model on the dataset, the WER immediately jumps up to 0.4 after one epoch.


On the forums, Lissyx posted the following reply:

This is something we also spotted. Try to decrease learning rate.

This suggestion was followed, but to no avail. Setting the learning rate to 0.00001 results in a WER between 0.97 and 0.99 after one epoch. The same is true for a value of 0.000001. To verify nothing else changed, I reran the script with the original learning rate, 0.0001, resulting in a WER of 0.61.

The relevant output for lr = 0.00001:

Test - WER: 0.991358, CER: 104.168596, loss: 418.055206
--------------------------------------------------------------------------------
WER: 1.000000, CER: 5.000000, loss: 17.521641
 - src: "ay me"
 - res: ""
--------------------------------------------------------------------------------
WER: 1.000000, CER: 6.000000, loss: 23.195108
 - src: "venice"
 - res: ""
--------------------------------------------------------------------------------
WER: 1.000000, CER: 7.000000, loss: 24.702639
 - src: "a story"
 - res: ""
--------------------------------------------------------------------------------
WER: 1.000000, CER: 7.000000, loss: 24.753246
 - src: "oh emil"
 - res: ""
--------------------------------------------------------------------------------
WER: 1.000000, CER: 9.000000, loss: 25.343424
 - src: "indeed ah"
 - res: ""
--------------------------------------------------------------------------------
WER: 1.000000, CER: 9.000000, loss: 31.011301
 - src: "verse two"
 - res: ""
--------------------------------------------------------------------------------
WER: 1.000000, CER: 9.000000, loss: 34.202015
 - src: "direction"
 - res: ""
--------------------------------------------------------------------------------
WER: 1.000000, CER: 11.000000, loss: 37.468643
 - src: "again again"
 - res: ""
--------------------------------------------------------------------------------
WER: 1.000000, CER: 12.000000, loss: 38.504765
 - src: "marie sighed"
 - res: ""
--------------------------------------------------------------------------------
WER: 1.000000, CER: 13.000000, loss: 39.427250
 - src: "hedge a fence"
 - res: ""
--------------------------------------------------------------------------------
I Exporting the model...

0.000001:

Test - WER: 0.992650, CER: 99.543596, loss: 303.750732
--------------------------------------------------------------------------------
WER: 1.000000, CER: 6.000000, loss: 28.675510
 - src: "a story"
 - res: "i "
--------------------------------------------------------------------------------
WER: 1.000000, CER: 8.000000, loss: 30.219297
 - src: "verse two"
 - res: "i "
--------------------------------------------------------------------------------
WER: 1.000000, CER: 6.000000, loss: 31.987692
 - src: "oh emil"
 - res: "i "
--------------------------------------------------------------------------------
WER: 1.000000, CER: 4.000000, loss: 33.258804
 - src: "ay me"
 - res: "i "
--------------------------------------------------------------------------------
WER: 1.000000, CER: 8.000000, loss: 34.096275
 - src: "direction"
 - res: "i "
--------------------------------------------------------------------------------
WER: 1.000000, CER: 5.000000, loss: 34.566467
 - src: "venice"
 - res: "i "
--------------------------------------------------------------------------------
WER: 1.000000, CER: 7.000000, loss: 37.244366
 - src: "indeed ah"
 - res: "he "
--------------------------------------------------------------------------------
WER: 1.000000, CER: 9.000000, loss: 40.380829
 - src: "poor alice"
 - res: "i "
--------------------------------------------------------------------------------
WER: 1.000000, CER: 11.000000, loss: 44.358227
 - src: "what was that"
 - res: "he "
--------------------------------------------------------------------------------
WER: 1.000000, CER: 12.000000, loss: 44.720150
 - src: "hans stirs not"
 - res: "he "
--------------------------------------------------------------------------------
I Exporting the model...
I Models exported at /home/bderuiter/scratch/checkpoint-1559664930/export

Training for two epochs with the original, higher learning rate also results in a WER of 0.98.

reuben commented 5 years ago

Can you reproduce this on 0.5.0?

rhamnett commented 5 years ago

Tried this myself on a stock 0.5.1 with only librivox CLEAN sets only. Can't reproduce

gr8nishan commented 5 years ago

@rhamnett are you able spot decrease in WER when you are training in 0.5.0. I am also facing an increase in WER when training on a custom dataset in 0.4.1

rhamnett commented 5 years ago

Might be wrong but seems the 0.5.1 was trained with some extra data set missing and will be corrected in 0.6.0

lissyx commented 4 years ago

@bopjesvla Is that still an issue with 0.6 ?

bopjesvla commented 4 years ago

I won't be able to check this in the near future. Since it couldn't be reproduced by @rhamnett in 0.5.1, I'll close the issue.

lock[bot] commented 4 years ago

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.