tensorflow / nmt

TensorFlow Neural Machine Translation Tutorial
Apache License 2.0
6.37k stars 1.96k forks source link

GNMT on tensorflow/nmt vs. GNMT on google/seq2seq #131

Open ghost opened 6 years ago

ghost commented 6 years ago

Hi, I am trying to reproduce the training results I generated using google/seq2seq on tensorflow/nmt.

I noticed that standard hyperparams provided here lead to much higher BLEU score (15.9 vs. 21.45 BLEU) when the same number of steps (80K) are trained. Is it because of the algorithmic changes such as normed_bahdanau and gnmt_v2, or is it because of the optimized implementation of NMT? Or because of any reasons else?

One more thing is, it uses SGD instead of Adam, which was the default of google/seq2seq. Moreover, the used learning rate is surprisingly high (1.0). Maybe the optimizer changes affected the training curve? I think Adam is more commonly used these days, so why is SGD selected in this case?

I would appreciate any explanations or comments that help me understand the algorithmic and implementational differences between these two trainings.

lmthang commented 6 years ago

Hi jongsae,

I think gnmt_v2 is a main factor then SGD. normed_bahdanau is better than bahdanau attention; scaled_luong is good, but somehow I couldn't get it to work with gnmt_v2. For optimizer, what I observed personally is Adam makes things easier to train but if you can manage to train with SGD with large learning rate, you will get better result! In fact, in all my NMT papers in 2014-2016, I used a pretty universal set of hyperparameters: sgd, learning rate 1.0, uniform init 0.1, grad norm 5, dropout 0.2 :)

Hope that helps!

ghost commented 6 years ago

Hello @lmthang,

I am developing an NMT system using google/seq2seq implementation. I would like to know your suggestions whether I should switch to tensorflow/nmt or it doesn't matter.

Both the implementations are a product of a Google team. I was wondering why both of them exists.

lmthang commented 6 years ago

Hi @ssokhey,

The google/seq2seq was developed to be general purpose with usage of Estimator & various customizations / add-ons.

The tensorflow/nmt was initially developed under a teaching perspective that avoids too high-level APIs like Estimator that abstracts away many details. Over the course of development, we also managed to replicate Google's NMT system with very good performance (outperforming the google/seq2seq too, see https://github.com/tensorflow/nmt#wmt-english-german--full-comparison).

I'd recommend using tensorflow/nmt as it is still being regularly maintained and can be used with newer versions of TF.

ghost commented 6 years ago

Thanks a lot! @lmthang

frajos100 commented 6 years ago

Hi We have set up TensorFlow NMT for training from French to English but the translation quality is not at all good even after 196K steps. Had raise an issue at https://github.com/tensorflow/nmt/issues/328. Do we need a new version of Tensorflow as I had installed tensorflow Nightly version. Is it the reason for the bad quality of translation. Also how do we set the NMT to MultiLingual model and zero shot translation ?

ghost commented 6 years ago

Hey @frajos100

Can you share the parameter setting you're using and also what is the dataset size?

frajos100 commented 6 years ago

The Parameter settings is same as the wmt16_gnmt_4_layer.json present at nmt / standard_hparams / wmt16_gnmt_4_layer.json on https://github.com/tensorflow/nmt The Dataaset that I had used is a preprocess data after tokenisation using the shell script provided wmt16_en_de.sh modified for the French version The Download links modified is as follows http://www.statmt.org/europarl/v7/fr-en.tgz http://www.statmt.org/wmt13/training-parallel-commoncrawl.tgz http://data.statmt.org/wmt17/translation-task/training-parallel-nc-v12.tgz http://data.statmt.org/wmt17/translation-task/dev.tgz I am really new at NMT and hence need your help. Also how do we set the NMT to MultiLingual model and zero shot translation ?

frajos100 commented 6 years ago

HI ssokhey , Any directions on how we could get the Frencch to English training improved?