Open ghost opened 7 years ago
Hi jongsae,
I think gnmt_v2 is a main factor then SGD. normed_bahdanau is better than bahdanau attention; scaled_luong is good, but somehow I couldn't get it to work with gnmt_v2. For optimizer, what I observed personally is Adam makes things easier to train but if you can manage to train with SGD with large learning rate, you will get better result! In fact, in all my NMT papers in 2014-2016, I used a pretty universal set of hyperparameters: sgd, learning rate 1.0, uniform init 0.1, grad norm 5, dropout 0.2 :)
Hope that helps!
Hello @lmthang,
I am developing an NMT system using google/seq2seq
implementation. I would like to know your suggestions whether I should switch to tensorflow/nmt
or it doesn't matter.
Both the implementations are a product of a Google team. I was wondering why both of them exists.
Hi @ssokhey,
The google/seq2seq was developed to be general purpose with usage of Estimator & various customizations / add-ons.
The tensorflow/nmt was initially developed under a teaching perspective that avoids too high-level APIs like Estimator that abstracts away many details. Over the course of development, we also managed to replicate Google's NMT system with very good performance (outperforming the google/seq2seq too, see https://github.com/tensorflow/nmt#wmt-english-german--full-comparison).
I'd recommend using tensorflow/nmt as it is still being regularly maintained and can be used with newer versions of TF.
Thanks a lot! @lmthang
Hi We have set up TensorFlow NMT for training from French to English but the translation quality is not at all good even after 196K steps. Had raise an issue at https://github.com/tensorflow/nmt/issues/328. Do we need a new version of Tensorflow as I had installed tensorflow Nightly version. Is it the reason for the bad quality of translation. Also how do we set the NMT to MultiLingual model and zero shot translation ?
Hey @frajos100
Can you share the parameter setting you're using and also what is the dataset size?
The Parameter settings is same as the wmt16_gnmt_4_layer.json present at nmt / standard_hparams / wmt16_gnmt_4_layer.json on https://github.com/tensorflow/nmt The Dataaset that I had used is a preprocess data after tokenisation using the shell script provided wmt16_en_de.sh modified for the French version The Download links modified is as follows http://www.statmt.org/europarl/v7/fr-en.tgz http://www.statmt.org/wmt13/training-parallel-commoncrawl.tgz http://data.statmt.org/wmt17/translation-task/training-parallel-nc-v12.tgz http://data.statmt.org/wmt17/translation-task/dev.tgz I am really new at NMT and hence need your help. Also how do we set the NMT to MultiLingual model and zero shot translation ?
HI ssokhey , Any directions on how we could get the Frencch to English training improved?
Hi, I am trying to reproduce the training results I generated using google/seq2seq on tensorflow/nmt.
I noticed that standard hyperparams provided here lead to much higher BLEU score (15.9 vs. 21.45 BLEU) when the same number of steps (80K) are trained. Is it because of the algorithmic changes such as normed_bahdanau and gnmt_v2, or is it because of the optimized implementation of NMT? Or because of any reasons else?
One more thing is, it uses SGD instead of Adam, which was the default of google/seq2seq. Moreover, the used learning rate is surprisingly high (1.0). Maybe the optimizer changes affected the training curve? I think Adam is more commonly used these days, so why is SGD selected in this case?
I would appreciate any explanations or comments that help me understand the algorithmic and implementational differences between these two trainings.