harvardnlp / seq2seq-attn

Sequence-to-sequence model with LSTM encoder/decoders and attention
http://nlp.seas.harvard.edu/code
MIT License
1.26k stars 278 forks source link

BLEU score 16.44 #29

Closed i55code closed 8 years ago

i55code commented 8 years ago

Hi yoonkim,

May you help to upload an example of the pre-word-vec-enc and -dec, and the shards input? I think it would be helpful for everyone.

Also I have run the pretrained model on the validation set given in your code. It is 16.44, may I ask which is your test set? Is it the same as the paper "Effective Approaches to Attention-based Neural Machine Translation"? Also, which model in the Table 1 of the paper that the pre-trained model replicates? Not the ensembled right?

Is it the "Base+reverse+dropout+global attention" or "Base+reverse+dropout+global attention+feed input"? The later is 18.1 in the original paper.

One last question: have you run your code on English French translation, for example Bengio's paper? or any other language pairs?

Thank you and enjoy the 4th July holiday!

Cheers, Zhong

yoonkim commented 8 years ago

I've uploaded the test sets. The pretrained model should get ~19.5 on en->de test. The pretrained models correspond to base+global attention+feed input (i.e. we do not reverse) from Luong et al.

Hmm instructions for the pre-word-vec-enc and shard-input should be clear-ish--are there any specific questions you had?

Thanks!

i55code commented 8 years ago

Hi Yoon Kim:),

Thank you so much! This is interesting, you reproduced the paper with a higher number:) The original paper is 18.1 with reverse, but yours is higher:) !

Is it because of the reverse is no longer useful when we are using attention? Or is it because your code is better, haha:)

yes, the instructions on everything is clear.

Just hope to check whether you have used your code to reproduce other benchmarks like Bengio's paper. I am training on English French dataset now, but the BLEU score is low. I wonder what is the reason. if you have trained on other languages, let me know.

Thank you!

Cheers, Zhong

yoonkim commented 8 years ago

Yeah we trained on slightly higher quality data that removed a lot of noisy sents, whereas I think Luong trained on the full dataset. See here for preprocessing details http://www.aclweb.org/anthology/P15-1001 (we obtained the preprocessed datasets from the authors of the above paper directly).

I haven't run English->French but you should be getting similar results to this paper (after unk replacement), which basically runs the same model on English->French data.

https://arxiv.org/pdf/1601.00372v2.pdf

Hope this helps!

ChenyuLInx commented 8 years ago

Hi Yoon Kim, I am trying to replicate the result and make some change to the model. After I train the model with layer_size = 4, rnn_size=1000, batch_size=64(tried to use 128 but do not have enough memory), srcvocabsize=50000 and targetvocabsize=50000. For the training data, I only used europarl-v7.de-en, which contains about 2M sentence pairs. It took 7 days to train the model with one GPU. but the BLEU score is only 7.14. My question is how did you set up the training parameters, how long it takes to train it and could you please share the training data you used with me? My email is chenyu.li@duke.edu.

Thank you, Chenyu Li