Closed zherowolf closed 6 years ago
by "default settings", you mean "--hparams_set=universal_transformer_base"? Have you tried "--hparams_set=universal_transformer_fc_base"?
thanks for your reply . @MostafaDehghani actually I used "--hparams_set=universal_transformer_base" as default settings and I haven't tried "universal_transformer_fc_base". I will try that. BTW, would you please show your settings or hyperparameters used in your paper? So I could reproduced the universal transformer in EN-DE translation task with a BLEU 28.9 as you said in your remarkable paper.
No problem :) For EN-DE, we used "universal_transformer_fc_base" and trained the model in multi-gpu setup (8*P100 GPUs, for 500k steps I believe). You should make sure that the capacity of the model (number of trainable parameters) for the Universal Transformer is similar to the counterpart Transformer model.
Thanks , that would be helpful I believe! I will try that in EN-DE and report my results later.
My settings: DATA_DIR=/home/phil/t2t_data_big_en_hi OUTDIR=/home/phil/big_en_hi/trained_model t2t-trainer \ --data_dir=$DATA_DIR \ --t2t_usr_dir=./big_en_hi/trainer \ --problem=big_en_hi \ --model=universal_transformer \ --hparams_set=universal_transformer_base \ --output_dir=$OUTDIR \ --worker_gpu=2 \ --train_steps=10000000
And the error: INFO:tensorflow:Cannot use 'Identity_74' as input to 'Identity_17' because they are in different while loops.
Identity_74 while context: universal_transformer/parallel_1_5/universal_transformer/universal_transformer/body/encoder/universal_transformer_basic/foldl/while/while_context Identity_17 while context: universal_transformer/parallel_0_5/universal_transformer/universal_transformer/body/encoder/universal_transformer_basic/foldl/while/while_context
Any help?
Isn't this the same issue raised in #1006?
Error exists even in the latest versions. tensor2tensor==1.8.0 tensorboard==1.10.0 tensorflow==1.10.0
for your information , I almost reproduced ende translation task results. In the paper, they achieved 28.9 with universal transformer. here's my results. my baseline with transformer base achieved 28.19 BLEU and universal transformer achieved 28.63 BLEU so far (have not reached convergence yet ) thanks @MostafaDehghani , great work!
for your information , the bleu reached 28.9 now.
Hi, is this sota minus the preprocessing and deliberation network?
@zherowolf great! Just make sure that you are not looking at the "approximate BLEU" in t2t :) Check out #436
I preprocessed my training and test data with moses scripts (inculding segment). And computed BLEU scores of each checkpoint results with mteval-v13a.perl after detokenize. BTW, I used 'training-parallel-nc-v11.tgz" in WMT16 , maybe differents from your guys.
@MostafaDehghani Hi, I can't find "universal_transformer_fc_base" in the latest code. Was it replaced?
@Bournet, Yep! Since people were mostly interested to try the MT experiments, I changed the default of the transition function from "sepconv" to "fc" in a PR I sent two days ago:https://github.com/tensorflow/tensor2tensor/pull/1036/commits/e4968979f904a7bcdf3ffe0591781f0efe2dae98
So right now, "universal_transformer_base" (which is equal to "universal_transformer_fc_base" in the old code) is the hparams_set you need to use to reproduce the MT results in the paper :)
@MostafaDehghani Ok, thank you for the reply :)
@zherowolf, can you share your logs and configs? I can't reproduce the paper's result, it's not convergence. INFO:tensorflow:loss = 5.653845, step = 2500 (147.504 sec) INFO:tensorflow:global_step/sec: 0.692351 INFO:tensorflow:loss = 5.602401, step = 2600 (144.438 sec) INFO:tensorflow:global_step/sec: 0.692506 INFO:tensorflow:loss = 5.603458, step = 2700 (144.400 sec) INFO:tensorflow:global_step/sec: 0.695547 INFO:tensorflow:loss = 5.5827146, step = 2800 (143.772 sec) INFO:tensorflow:global_step/sec: 0.692561 INFO:tensorflow:loss = 5.7178345, step = 2900 (144.391 sec) INFO:tensorflow:global_step/sec: 0.691745 INFO:tensorflow:loss = 5.53726, step = 3000 (144.562 sec) INFO:tensorflow:global_step/sec: 0.693078 INFO:tensorflow:loss = 5.4643216, step = 3100 (144.284 sec) INFO:tensorflow:global_step/sec: 0.691953 INFO:tensorflow:loss = 5.4527507, step = 3200 (144.519 sec) INFO:tensorflow:global_step/sec: 0.690533 INFO:tensorflow:loss = 5.5876875, step = 3300 (144.816 sec) INFO:tensorflow:global_step/sec: 0.692915 INFO:tensorflow:loss = 5.5414114, step = 3400 (144.318 sec)
@zherowolf hi,could you please show us your t2t_trainer settings,thank you in advance! following is my settings(no convergence): nohup t2t-trainer \ --data_dir=$DATA_DIR \ --problem=translate_ende_wmt32k \ --model=universal_transformer \ --hparams_set=universal_transformer_base \ --hparams='batch_size=5120' \ --train_steps=7000000 \ --random_seed=33 \ --worker_gpu=8 \ --output_dir=$TRAIN_DIR \ --eval_steps=10000 & any help?
@zherowolf , --train_steps=7000000 ? how many steps when your model reach 28.9? From your above figure, 69 epoches, it is about 550000 steps, is that correct? Thanks.
sorry for replying late. My Experiments settings are followings: [data preprocess]
[setup] I did not use translate_ende_wmt32k and defined my own problem and hparams for ende , but I don't think there's much difference. for transformer base model: for universal transformer base:
[training] I trained both models with 4 gpus of V100 . for transformer base, I trained 361600 steps within about 60 hours. for universal transformer, I trained 262000 steps wthin about 90 hours. and the final results in newstest2014 with each checkpoint are in following:
I also trained a transformer big model (green line), which is better with more parameters.
I hope this would help . @robotzheng @li10141110
@zherowolf i train with both universal_transformer_big and transformer_big in one gpu P40 (PROBLEM=translate_enzh_wmt32k), but in the training process ,the process will stop itself early, it's so weird. Have you met this?
@MostafaDehghani could you help to see the issue #1006 , several people meet a common problem, when tried hparams of universal_transformer_base, about 10W steps, it will not be convergence, the loss maintain about 4~5. Any solution to this?
INFO:tensorflow:Saving dict for global step 109000: global_step = 109000, loss = 4.23917, metrics-translate_enzh_wmt32k/targets/accuracy = 0.32611865, metrics-translate_enzh_wmt32k/targets/accuracy_per_sequence = 0.0, metrics-translate_enzh_wmt32k/targets/accuracy_top5 = 0.5148043, metrics-translate_enzh_wmt32k/targets/approx_bleu_score = 0.028782098, metrics-translate_enzh_wmt32k/targets/neg_log_perplexity = -4.2202907, metrics-translate_enzh_wmt32k/targets/rouge_2_fscore = 0.08715047, metrics-translate_enzh_wmt32k/targets/rouge_L_fscore = 0.33720428 INFO:tensorflow:Saving 'checkpoint_path' summary for global step 109000: /home/exuekun/AI_Challenger_2018_base/Baselines/english_chinese_machine_translation_baseline/train/universal_train/model.ckpt-109000
@MostafaDehghani 你可以帮忙看看问题#1006,几个人遇到一个常见的问题,当试用hparams的universal_transformer_base时,大约10W步,它不会收敛,损失保持在4~5左右。对此有何解决方案?
INFO:tensorflow:保存全局步骤109000的dict:global_step = 109000,loss = 4.23917,metrics-translate_enzh_wmt32k / targets / accuracy = 0.32611865,metrics-translate_enzh_wmt32k / targets / accuracy_per_sequence = 0.0,metrics-translate_enzh_wmt32k / targets / accuracy_top5 = 0.5148043,metrics -translate_enzh_wmt32k / targets / approx_bleu_score = 0.028782098,metrics-translate_enzh_wmt32k / targets / neg_log_perplexity = -4.2202907,metrics-translate_enzh_wmt32k / targets / rouge_2_fscore = 0.08715047,metrics-translate_enzh_wmt32k / targets / rouge_L_fscore = 0.33720428 INFO:tensorflow:为全局保存'checkpoint_path'摘要步骤109000:/home/exuekun/AI_Challenger_2018_base/Baselines/english_chinese_machine_translation_baseline/train/universal_train/model.ckpt-109000
hi, Can I add your WeChat? i have the same question.
@zherowolf Thank you so much and could please tell us your universal transformer t2t_trainer settings in zh-en mt task. My loss fluctuated between 2-3 within 400,000 steps in my experiments with default settings in zh-en mt task.
sorry for replying late. My Experiments settings are followings: [data preprocess]
1. I used "training-parallel-commoncrawl.tgz" , "training-parallel-europarl-v7.tgz" and "training-parallel-nc-v12.tgz" which are available on wmt website. 2. I preprocessed my data with https://github.com/pytorch/fairseq/blob/master/examples/translation/prepare-wmt14en2de.sh and thanks to @myleott
[setup] I did not use translate_ende_wmt32k and defined my own problem and hparams for ende , but I don't think there's much difference. for transformer base model: for universal transformer base:
[training] I trained both models with 4 gpus of V100 . for transformer base, I trained 361600 steps within about 60 hours. for universal transformer, I trained 262000 steps wthin about 90 hours. and the final results in newstest2014 with each checkpoint are in following:
I also trained a transformer big model (green line), which is better with more parameters.
I hope this would help . @robotzheng @li10141110
@zherowolf Will you share your code?
"universal_transformer_big" does not work, I have to use "universal_transformer_base" with "hparams="hidden_size=2048,filter_size=8196". "universal_transformer_big" requires more GPU RAM than "transformer_big", thus smaller batch size (in my case 2048 vs 3600).
Description
hi,guys, Did someone try universal transformer in machine translation tasks? My experiments with default settings does not surpass transformer in zh-en mt task.