harvardnlp / seq2seq-attn

Sequence-to-sequence model with LSTM encoder/decoders and attention
http://nlp.seas.harvard.edu/code
MIT License
1.26k stars 278 forks source link

Ask for help #80

Closed hfxunlp closed 7 years ago

hfxunlp commented 7 years ago

I try to use adam with learning rate 0.1 as advised, but seems it can not get proper trained, our chinese-english corpus have about 300M, but get only 15 BLEU score with this code, I wonder is there something I did wrong with the command: th train.lua -data_file data/base-train.hdf5 -val_data_file data/base-val.hdf5 -savefile base-model -num_layers 1 -rnn_size 1000 -word_vec_size 620 -reverse_src 1 -dropout 0.2 -lr_decay 0.5 -attn 1 -optim adam -learning_rate 0.1 -start_decay_at 1 -max_batch_l 80 -gpuid 1 | tee tlog.txt

yoonkim commented 7 years ago

How does SGD perform? I would always try SGD first. Also, try lr = 0.01 or 0.001

hfxunlp commented 7 years ago

adam could work with little learning rate like 0.0001, and performs a little better than SGD(BLEU:13).

yoonkim commented 7 years ago

Hmm ok. I haven't tried this, but apparently Google had better success by training with Adam for a few epochs and then switching to SGD. (https://arxiv.org/pdf/1609.08144v2.pdf bottom of page 14)

hfxunlp commented 7 years ago

Thank you for your help, and I found that all the alpha in my train set is little case, but not the same in test set, This may result in the poor BLEU score.