Closed hfxunlp closed 7 years ago
How does SGD perform? I would always try SGD first. Also, try lr = 0.01 or 0.001
adam could work with little learning rate like 0.0001, and performs a little better than SGD(BLEU:13).
Hmm ok. I haven't tried this, but apparently Google had better success by training with Adam for a few epochs and then switching to SGD. (https://arxiv.org/pdf/1609.08144v2.pdf bottom of page 14)
Thank you for your help, and I found that all the alpha in my train set is little case, but not the same in test set, This may result in the poor BLEU score.
I try to use adam with learning rate 0.1 as advised, but seems it can not get proper trained, our chinese-english corpus have about 300M, but get only 15 BLEU score with this code, I wonder is there something I did wrong with the command:
th train.lua -data_file data/base-train.hdf5 -val_data_file data/base-val.hdf5 -savefile base-model -num_layers 1 -rnn_size 1000 -word_vec_size 620 -reverse_src 1 -dropout 0.2 -lr_decay 0.5 -attn 1 -optim adam -learning_rate 0.1 -start_decay_at 1 -max_batch_l 80 -gpuid 1 | tee tlog.txt