How can I reproduce mocha WER in Librispeech?

hirofumi0810 / neural_sp

End-to-end ASR/LM implementation with PyTorch

Apache License 2.0

594 stars 140 forks source link

How can I reproduce mocha WER in Librispeech? #67

Closed Park-Jong-Min closed 4 years ago

Park-Jong-Min commented 4 years ago

I want to reproduce Librispeech MOCHA WER in this paper (CTC-synchronous Training for Monotonic Attention Model). When I use conf file "lstm_mocha.yaml" in librispeech example, it was reduced from 700% wer at first to 100% wer at 35 epoch without lm. I just change conf in run.sh and training with 1 GPU. I would like to know if there is anything else that needs to be modified to make the wer in addition to the conf on run.sh.

hirofumi0810 commented 4 years ago

How was the learning curve? I think you have the transition of WER during training.

Park-Jong-Min commented 4 years ago

My learning curve looks weird.

hirofumi0810 commented 4 years ago

How was the accuracy?

Park-Jong-Min commented 4 years ago

This is my acc plot.

hirofumi0810 commented 4 years ago

I will show you the plotting of the model used in the paper. loss acc edit_distance

Park-Jong-Min commented 4 years ago

thank you, i'll try to replot it.

Park-Jong-Min commented 4 years ago

I try to replot it. loss and acc look similar but still WER is too high. I use beam search width 1 which means greedy decoding.

My decoding result looks like as prose to me and a later streets of stern with throngs of well rescue families in way either in the world's and a little and a little and a little and a little and a little and a little and a little and a little and a little and a little and a little and a little and a little and a little and a little and a little and a little and a little and a little and a little and a little and a little and a little and a little and a little and a little ......

If i use beamsearch width 10 and length penalty 2.0, It takes about seven hours to run only one epoch. Is there a good way to solve a problem like the above sentence other than using beamsearch and length penalty in mocha? I wonder how long it takes to converge lstm_mocha without LM .

hirofumi0810 commented 4 years ago

I have never experienced such behaviors. One simple debugging is to train the model with a bidirectional encoder. The optimization should be easier.

Park-Jong-Min commented 4 years ago

After I fix quantity loss to 0.2 it shows similar edit distance!

hirofumi0810 commented 4 years ago

@Park-Jong-Min Great. Did you use 0.01 before? Actually, this number is tuned with BLSTM/LC-BLSTM encoders. I will try 0.2 with the unidirectional encoder. Thank you for your report!

Park-Jong-Min commented 4 years ago

Yes, I use 0.01 before.