Training Configuration for TEDLIUMv2

car3936 commented 4 years ago

Hi, thanks for a great framework and experiments.

I'm reproducing the TEDLIUMv2 LSTM results in Table 4 from the paper "A comparison of Transformer and LSTM encoder decoder models for ASR".

I'm using the data preprocessing pipeline from here. My configuration file is here. (This is almost the same with librispeech configuration, I rewrote it for TEDLIUMv2.)

I expected the test WER of about 10.8%, but I can only get 11.5% ~12.0% from several runs. Am I missing something? Could you kindly share the training configures for TEDLIUMv2?

Thanks.

albertz commented 4 years ago

Thanks for your interest in the project. Yes, I noticed myself that I forgot to upload the configs. I will do that very soon, and write back here.

albertz commented 4 years ago

Sorry for the delay. I uploaded the configs now (here). 11.5% is not too far though.

Just out of curiosity, I made a diff between your config and base2.conv2l.specaug.curric3.eos.config, and these are some of the differences:

You use epoch_wise_filter, while I don't. In general: Things like curriculum learning, pretraining, and similar things can help to some degree, but it will also hurt performance if you do it too much. Always try to do is as less as possible but still enough such that it works well. Unfortunately this is something which needs to be tuned a little bit when you go to a new task.
I use a smaller batch size (10k, not 20k). This might make it a bit slower, but can also improve WER (because you do effectively more updates on the model params).
I actually do less epochs (150, not 375).
I have use_learning_rate_control_always = False. This will disable the learning rate scheduling during the pretraining phase.
I have newbob_learning_rate_decay = 0.8 (not 0.9). But this is because I also train less epochs. If you want to train longer, 0.9 is fine. Longer training probably also can increase WER (unless you do have too much overfitting).

car3936 commented 4 years ago

Thanks for the updates! And I really appreciate your helpful advice.

rwth-i6 / returnn-experiments

Training Configuration for TEDLIUMv2 #47