Closed xingchensong closed 5 years ago
here is my loss curve of epoch1
This is very normal. Just keep training and see the loss how changed epoch by epoch but not iters by iters.
If the final model don't work fine, maybe you need to try different "k" which effects learning rate.
Thx.
@kaituoxu
hi,kaituo,i use a small dataset(nearly 128 librispeech wavs) to test whether the model can converge over it . the figure shows that the loss remains unchanged after converging to 3.0 in the early iters(which is similar to the picture i commented before) , After about 150 epochs, the loss starts falling again (but cv loss begin increase which means overfitting).
this loss curve is so wield! and i have tried a lot of different hyperparameters(such as k, warmup_steps, n_layers_enc, d_model .etc)they all appears to converge to 3.0 and stay the same. This has been bothering me for a long time. I don't know what went wrong : (
besides, my input is same to you (except fbank feature extracted by librosa, not by kaldi),my labels contain 26 lowercase letters and space_tok , unknown_tok, start_tok and end_tok. i use batch_size to genarate one batch data instead of batch_frames.
This is very normal. Just keep training and see the loss how changed epoch by epoch but not iters by iters.
If the final model don't work fine, maybe you need to try different "k" which effects learning rate.
Thx.
another question is , for a small dataset(128 wavs), it takes few iters to converge to 3.0 and takes 300 epochs to decrease again , is that normal?
@stephen-song , if you want to overfit 128 wavs, first of all, close all regularization, such as L2, dropout, label smoothing, then train you models again. Besides, try different "k", it is very very very important for the model to converge.
Hi @stephen-song, how about the result?
Hi @stephen-song, how about the result?
hi, kaituo, this model can finally overfit 128 librispeech wavs , k and batch_size(or batch_frames) are truely important to make it work(just as u mentioned) . fine-tune those hyper params on whole Librispeech dataset is not time-worthy for me , so now i use aishell instead and focus on modifying the model(such as adding 2DAttention mentioned in paper [1]).
Okay, thanks for your response :)
@xingchensong
I have a similar problem. can you share the value you used for the parameteres. a snapshot of run.sh would be great.
Hi, kaituo ,i'm trying to train this network on librispeech ,the loss curve of epoch 1 shows that the model tends to saturate after first few steps(there has approximately 4k iters per epoch,and my loss has dropped from 4 to 3 after 100 iters and then stays the same.) I have not made any changes to the model. The only change i do is to use my own dataloader (for loading librispeech corpus) . so i wander if u have the same trend of loss-decline on traning aishell corpus?