Open heshenghuan opened 3 years ago
Learning rate is too big?
Learning rate is too big?
It seems like a random seed problem. If default random seed was chosen, the training loss increasing problem appeared(even with a smaller learning rate). Choose another random seed is useful for this problem.
Another issuse:
https://github.com/namisan/mt-dnn/blob/471f717a25ab744e710591274c3ec098f5f4d0ad/train.py#L398
This line seems will overwrite the training params when using a pre-trained mt-dnn model.
@heshenghuan , I reran the script with different random seeds and didnot hit the bug as your mentioned. I'm wondering which pretrained model is used in your experiments.
Yes, config in mt-dnn should be the same as the pretrained config. If I remember correctly, I removed other unrelated args.
The init_checkpoint is trained by using
scripts/run_mt_dnn.sh
, and only the random seed changed, the train loss: