tensorflow / tensor2tensor

Library of deep learning models and datasets designed to make deep learning more accessible and accelerate ML research.
Apache License 2.0
15.6k stars 3.51k forks source link

Reproduce universal transformer in EN-DE translation task #1061

Open Bournet opened 6 years ago

Bournet commented 6 years ago

Description

I want to reproduce then EN-DE translation task with universal transformer.

My training setting is python t2t-trainer \ --data_dir=$DATA_DIR \ --worker_gpu=8 \ --problem=translate_ende_wmt32k \ --model=universal_transformer \ --train_steps=500000 \ --hparams_set=universal_transformer_base \ --hparams='batch_size=2048' \ --output_dir=$TRAIN_DIR

My decoder setting is python t2t-decoder \ --data_dir=$DATA_DIR \ --problem=translate_ende_wmt32k \ --model=universal_transformer \ --hparams_set=universal_transformer_base \ --output_dir=$TRAIN_DIR \ --decode_hparams="beam_size=4,alpha=0.6,batch_size=32" \ --decode_from_file=$DECODE_FILE \ --decode_to_file=$DECODE_OFILE

The decoding result looks wrong using the intermediate model. Did anyone try the EN-DE translation task, and can you show the settings? This is the decoder log.

INFO:tensorflow:Inference results INPUT: A report filed to the European Commission in 2011 described intersex people as different from transsexual or transgender people, as their status is not gender related but instead relates to their biological makeup, which is neither exclusively male nor exclusively female, but is typical of both at once or not clearly defined as either. INFO:tensorflow:Inference results OUTPUT: Ich möchte ich möchte ich möchte ich möchte ich möchte ich möchte ich möchte ich möchte ich möchte ich möchte ich möchte ich möchte ich möchte ich möchte ich möchte ich möchte ich möchte ich möchte ich möchte ich möchte ich möchte ich möchte ich möchte ich möchte ich möchte ich möchte ich möchte ich möchte ich möchte ich möchte ich möchte ich möchte ich möchte ich möchte ich möchte ich möchte ich möchte ich möchte ich möchte ich möchte ich möchte ich möchte ich möchte ich möchte ich möchte ich möchte ich möchte ich möchte ich möchte ich möchte ich möchte ich möchte ich möchte ich möchte ich möchte ich möchte ich möchte ich möchte ich möchte ich möchte ich möchte ich möchte ich möchte ich möchte ich möchte ich möchte ich möchte ich möchte ich möchte ich möchte ich möchte ich möchte ich möchte ich möchte ich möchte ich möchte ich möchte ich möchte ich möchte ich möchte ich möchte ich möchte ich möchte ich möchte ich möchte ich möchte ich möchte ich möchte ich möchte ich möchte ich möchte ich möchte ich möchte ich möchte ich möchte ich möchte ich möchte ich möchte ich möchte ich möchte ich möchte ich möchte ich möchte ich möchte ich möchte ich möchte ich möchte ich möchte ich möchte ich möchte ich möchte ich möchte ich möchte ich möchte ich möchte ich möchte ich möchte ich möchte ich möchte ich möchte ich möchte ich möchte

Environment information

$ pip freeze | grep tensor tensor2tensor==1.9.0 tensorboard==1.9.0 tensorflow-gpu==1.9.0

robotzheng commented 6 years ago

INFO:tensorflow:Saving checkpoints for 0 into /home/zzt/tensor2tensor/t2t_train/translate_ende_wmt32k/universal_transformer-universal_transformer_base/model.ckpt. INFO:tensorflow:loss = 24.992943, step = 0 INFO:tensorflow:global_step/sec: 0.604108 INFO:tensorflow:loss = 24.269491, step = 100 (165.534 sec) INFO:tensorflow:global_step/sec: 0.69432 INFO:tensorflow:loss = 22.55278, step = 200 (144.028 sec) INFO:tensorflow:global_step/sec: 0.690579 INFO:tensorflow:loss = 19.752926, step = 300 (144.805 sec) INFO:tensorflow:global_step/sec: 0.695993 INFO:tensorflow:loss = 16.198309, step = 400 (143.680 sec) INFO:tensorflow:global_step/sec: 0.694211 INFO:tensorflow:loss = 12.487985, step = 500 (144.052 sec) INFO:tensorflow:global_step/sec: 0.692518 INFO:tensorflow:loss = 9.890932, step = 600 (144.396 sec) INFO:tensorflow:global_step/sec: 0.694927 INFO:tensorflow:loss = 8.401972, step = 700 (143.901 sec) INFO:tensorflow:global_step/sec: 0.693698 INFO:tensorflow:loss = 7.374533, step = 800 (144.155 sec) INFO:tensorflow:global_step/sec: 0.691139 INFO:tensorflow:loss = 6.6476445, step = 900 (144.688 sec) INFO:tensorflow:global_step/sec: 0.691328 INFO:tensorflow:loss = 6.4183955, step = 1000 (144.650 sec) INFO:tensorflow:global_step/sec: 0.695598 INFO:tensorflow:loss = 6.2096443, step = 1100 (143.760 sec) INFO:tensorflow:global_step/sec: 0.692548 INFO:tensorflow:loss = 6.118829, step = 1200 (144.394 sec) INFO:tensorflow:global_step/sec: 0.691335 INFO:tensorflow:loss = 6.156118, step = 1300 (144.649 sec) INFO:tensorflow:global_step/sec: 0.692573 INFO:tensorflow:loss = 5.8592772, step = 1400 (144.389 sec) INFO:tensorflow:global_step/sec: 0.693011 INFO:tensorflow:loss = 5.883662, step = 1500 (144.298 sec) INFO:tensorflow:global_step/sec: 0.69364 INFO:tensorflow:loss = 5.819623, step = 1600 (144.167 sec) INFO:tensorflow:global_step/sec: 0.689941 INFO:tensorflow:loss = 5.7728457, step = 1700 (144.940 sec) INFO:tensorflow:global_step/sec: 0.69491 INFO:tensorflow:loss = 5.734238, step = 1800 (143.903 sec) INFO:tensorflow:global_step/sec: 0.691566 INFO:tensorflow:loss = 5.8356404, step = 1900 (144.600 sec) INFO:tensorflow:global_step/sec: 0.692362 INFO:tensorflow:loss = 5.7228365, step = 2000 (144.433 sec) INFO:tensorflow:global_step/sec: 0.692862 INFO:tensorflow:loss = 5.8236485, step = 2100 (144.329 sec) INFO:tensorflow:global_step/sec: 0.695919 INFO:tensorflow:loss = 5.7823954, step = 2200 (143.694 sec) INFO:tensorflow:global_step/sec: 0.692322 INFO:tensorflow:loss = 5.62051, step = 2300 (144.440 sec) INFO:tensorflow:global_step/sec: 0.689248 INFO:tensorflow:loss = 5.7471213, step = 2400 (145.087 sec) INFO:tensorflow:Saving checkpoints for 2465 into /home/zzt/tensor2tensor/t2t_train/translate_ende_wmt32k/universal_transformer-universal_transformer_base/model.ckpt. INFO:tensorflow:global_step/sec: 0.677949 INFO:tensorflow:loss = 5.653845, step = 2500 (147.504 sec) INFO:tensorflow:global_step/sec: 0.692351 INFO:tensorflow:loss = 5.602401, step = 2600 (144.438 sec) INFO:tensorflow:global_step/sec: 0.692506 INFO:tensorflow:loss = 5.603458, step = 2700 (144.400 sec) INFO:tensorflow:global_step/sec: 0.695547 INFO:tensorflow:loss = 5.5827146, step = 2800 (143.772 sec)

not convergence

robotzheng commented 6 years ago

INFO:tensorflow:global_step/sec: 0.692 INFO:tensorflow:loss = 4.9120493, step = 9900 (144.509 sec) INFO:tensorflow:Saving checkpoints for 9940 into /home/zzt/tensor2tensor/t2t_train/translate_ende_wmt32k/universal_transformer-universal_transformer_base/model.ckpt. INFO:tensorflow:global_step/sec: 0.680752 INFO:tensorflow:loss = 4.6880198, step = 10000 (146.896 sec) INFO:tensorflow:global_step/sec: 0.690562 INFO:tensorflow:loss = 4.7614098, step = 10100 (144.809 sec) INFO:tensorflow:global_step/sec: 0.695003 INFO:tensorflow:loss = 4.6765833, step = 10200 (143.884 sec) INFO:tensorflow:global_step/sec: 0.693545

Bournet commented 6 years ago

@robotzheng decode with different checkpoints, the results are same

li10141110 commented 6 years ago

when will this convergence?any help? result: INFO:tensorflow:Done running local_init_op. INFO:tensorflow:Evaluation [10/100] INFO:tensorflow:Evaluation [20/100] INFO:tensorflow:Evaluation [30/100] INFO:tensorflow:Finished evaluation at 2018-09-16-01:27:12 INFO:tensorflow:Saving dict for global step 112000: global_step = 112000, loss = 5.57022, metrics-translate_ende_wmt32k/targets/accuracy = 0.16576138, metrics-translate_ende_wmt32k/targets/accuracy_per_sequence = 0.018324608, metrics-translate_ende_wmt32k/targets/accuracy_top5 = 0.35326138, metrics-translate_ende_wmt32k/targets/approx_bleu_score = 0.008100358, metrics-translate_ende_wmt32k/targets/neg_log_perplexity = -5.573707, metrics-translate_ende_wmt32k/targets/rouge_2_fscore = 0.02754218, metrics-translate_ende_wmt32k/targets/rouge_L_fscore = 0.18279965 INFO:tensorflow:Saving 'checkpoint_path' summary for global step 112000: /home/lijing/t2t_train//-/model.ckpt-112000 INFO:tensorflow:global_step/sec: 0.97317 INFO:tensorflow:loss = 4.6445794, step = 112000 (102.758 sec) INFO:tensorflow:global_step/sec: 1.42198 INFO:tensorflow:loss = 4.6680565, step = 112100 (70.325 sec) INFO:tensorflow:global_step/sec: 1.41794 INFO:tensorflow:loss = 4.630338, step = 112200 (70.525 sec) INFO:tensorflow:global_step/sec: 1.42208 INFO:tensorflow:loss = 4.8249073, step = 112300 (70.320 sec)

li10141110 commented 6 years ago

for your information: gpu is titan 1080xp; t2t-trainer \ --data_dir=$DATA_DIR \ --problem=translate_ende_wmt32k \ --model=universal_transformer \ --hparams_set=universal_transformer_base \ --hparams='batch_size=2048' \ -worker_gpu=4 \ --output_dir=$TRAIN_DIR

kudou1994 commented 6 years ago

why i can't use the universal_transformer ? when i try to use it to train a model , then i have a mistake: ValueError: Cannot use 'Identity_122' as input to 'Identity_33' because they are in different while loops. See info log for more details.

Bournet commented 6 years ago

@kudou1994

1006

Maybe helpful

kudou1994 commented 6 years ago

Thanks, I updated the kit to solve this problem

kudou1994 commented 6 years ago

i have a new issue. when i use universal_transformer_big to train a model, the BLEU score is very low, approx_bleu_score = 0.01985711,INFO:tensorflow:loss = 4.3538547, step = 22000 (82.838 sec)

li10141110 commented 6 years ago

@kudou1994 same problem as you,my step is 500000,but loss maintain about 4.6,any help

crystal0913 commented 6 years ago

i got the same problem as you @kudou1994 , i have tried hparams of universal_transformer_base and universal_transformer_teeny, both of them are not convergence. About after 3k steps, the loss maintain about 4~5, have you solved this problem?

Bournet commented 6 years ago

@kudou1994 same problem as you,my step is 500000,but loss maintain about 4.6,any help

Have you solved the problem?

li10141110 commented 6 years ago

@Bournet no~~

kweonwooj commented 6 years ago

Same problem here. Can't reproduce universal transformer small in original paper

## version
tensorflow==1.11.0
tensor2tensor==1.9.0

## trainer code
# get data
PROBLEM=translate_ende_wmt_bpe32
DATA_DIR=ende.corpus
TMP_DIR=ende.tmp.corpus

t2t-datagen \
    --data_dir=$DATA_DIR \
    --tmp_dir=$TMP_DIR \
    --problem=$PROBLEM

# train
MODEL=transformer
HPARAMS=universal_transformer_small
TRAIN_DIR=ende.univ_trans_small

t2t-trainer \
  --data_dir=$DATA_DIR \
  --problem=$PROBLEM \
  --model=$MODEL \
  --hparams_set=$HPARAMS \
  --output_dir=$TRAIN_DIR \
  --worker_gpu=8 \
  --train_steps=100000 \
  --keep_checkpoint_max=10 \
  --iterations_per_loop=10000 \
  --local_eval_frequency=10000 \

Below is the first 240 lines of train log :

# FLAGS :
tensor2tensor.data_generators.audio:
  --timit_paths: Comma-separated list of tarfiles containing TIMIT datasets
    (default: '')

tensor2tensor.data_generators.gym_problems:
  --agent_policy_path: File with model for agent.
  --autoencoder_path: File with model for autoencoder.

tensor2tensor.data_generators.wsj_parsing:
  --parsing_path: Path to parsing files in tmp_dir.
    (default: '')

tensor2tensor.utils.flags:
  --data_dir: Directory with training data.
  --[no]dbgprofile: If True, record the timeline for chrome://tracing/.
    (default: 'false')
  --decode_from_file: Path to the source file for decoding, used by
    continuous_decode_from_file.
  --decode_hparams: Comma-separated list of name=value pairs to control decode
    behavior. See decoding.decode_hparams for defaults.
    (default: '')
  --decode_reference: Path to the reference file for decoding, used by
    continuous_decode_from_file to compute BLEU score.
  --decode_to_file: Path to the decoded file generated by decoding, used by
    continuous_decode_from_file.
  --[no]enable_graph_rewriter: Enable graph optimizations that are not on by
    default.
    (default: 'false')
  --eval_early_stopping_metric: If --eval_early_stopping_steps is not None, then
    stop when --eval_early_stopping_metric has not decreased for
    --eval_early_stopping_steps
    (default: 'loss')
  --eval_early_stopping_metric_delta: Delta determining whether metric has
    plateaued.
    (default: '0.1')
    (a number)
  --[no]eval_early_stopping_metric_minimize: Whether to check for the early
    stopping metric going down or up.
    (default: 'true')
  --eval_early_stopping_steps: If --eval_early_stopping_steps is not None, then
    stop when --eval_early_stopping_metric has not decreased for
    --eval_early_stopping_steps
    (an integer)
  --[no]eval_run_autoregressive: Run eval autoregressively where we condition on
    previousgenerated output instead of the actual target.
    (default: 'false')
  --eval_throttle_seconds: Do not re-evaluate unless the last evaluation was
    started at least this many seconds ago.
    (default: '1')
    (an integer)
  --[no]eval_use_test_set: Whether to use the '-test' data for EVAL (and
    PREDICT).
    (default: 'false')
  --[no]export_saved_model: DEPRECATED - see serving/export.py.
    (default: 'false')
  --gpu_order: Optional order for daisy-chaining GPUs. e.g. "1 3 2 4"
    (default: '')
  --hparams: A comma-separated list of `name=value` hyperparameter values. This
    flag is used to override hyperparameter settings either when manually
    selecting hyperparameters or when using Vizier. If a hyperparameter setting
    is specified by this flag then it must be a valid hyperparameter name for
    the model.
    (default: '')
  --hparams_range: Parameters range.
  --hparams_set: Which parameters to use.
  --keep_checkpoint_every_n_hours: Number of hours between each checkpoint to be
    saved. The default value 10,000 hours effectively disables it.
    (default: '10000')
    (an integer)
  --keep_checkpoint_max: How many recent checkpoints to keep.
    (default: '20')
    (an integer)
  --local_eval_frequency: Save checkpoints and run evaluation every N steps
    during local training.
    (default: '1')
    (an integer)
  --[no]locally_shard_to_cpu: Use CPU as a sharding device running locally. This
    allows to test sharded model construction on a machine with 1 GPU.
    (default: 'false')
  --[no]log_device_placement: Whether to log device placement.
    (default: 'false')
  --model: Which model to use.
  --problem: Problem name.
  --ps_gpu: How many GPUs to use per ps.
    (default: '0')
    (an integer)
  --ps_job: name of ps job
    (default: '/job:ps')
  --ps_replicas: How many ps replicas.
    (default: '0')
    (an integer)
  --[no]registry_help: If True, logs the contents of the registry and exits.
    (default: 'false')
  --save_checkpoints_secs: Save checkpoints every this many seconds. Default=0
    means save checkpoints each x steps where x is max(iterations_per_loop,
    local_eval_frequency).
    (WARNING:tensorflow:From /users/kweonwooj/convergence_of_transformer/venv/lib/python2.7/site-packages/tensor2tensor/utils/trainer_lib.py:198: __init__ (from tensorflow.contrib.learn.python.learn.estimators.run_config) is deprecated and will be removed in a future version.
Instructions for updating:
When switching to tf.estimator.Estimator, use tf.estimator.RunConfig instead.
INFO:tensorflow:schedule=continuous_train_and_eval
INFO:tensorflow:worker_gpu=8
INFO:tensorflow:sync=False
WARNING:tensorflow:Schedule=continuous_train_and_eval. Assuming that training is running on a single machine.
INFO:tensorflow:datashard_devices: ['gpu:0', 'gpu:1', 'gpu:2', 'gpu:3', 'gpu:4', 'gpu:5', 'gpu:6', 'gpu:7']
INFO:tensorflow:caching_devices: None
INFO:tensorflow:ps_devices: ['gpu:0', 'gpu:1', 'gpu:2', 'gpu:3', 'gpu:4', 'gpu:5', 'gpu:6', 'gpu:7']
INFO:tensorflow:Using config: {'_save_checkpoints_secs': None, '_num_ps_replicas': 0, '_keep_checkpoint_max': 30, '_task_type': None, '_train_distribute': None, '_is_chief': True, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0xc3a0590>, '_tf_config': gpu_options {
  per_process_gpu_memory_fraction: 1.0
}
, '_protocol': None, '_save_checkpoints_steps': 10000, '_keep_checkpoint_every_n_hours': 10000, '_session_config': gpu_options {
  per_process_gpu_memory_fraction: 0.95
}
allow_soft_placement: true
graph_options {
  optimizer_options {
  }
}
, '_model_dir': 'ende.univ_trans_small', 'use_tpu': False, '_tf_random_seed': None, '_master': '', '_device_fn': None, '_num_worker_replicas': 0, '_task_id': 0, '_log_step_count_steps': 100, '_evaluation_master': '', '_eval_distribute': None, 'data_parallelism': <tensor2tensor.utils.expert_utils.Parallelism object at 0xc3a0610>, '_environment': 'local', '_save_summary_steps': 100, 't2t_device_info': {'num_async_replicas': 1}}
WARNING:tensorflow:Estimator's model_fn (<function wrapping_model_fn at 0xc2f35f0>) includes params argument, but params are not passed to Estimator.
WARNING:tensorflow:ValidationMonitor only works with --schedule=train_and_evaluate
INFO:tensorflow:Running training and evaluation locally (non-distributed).
INFO:tensorflow:Start train and evaluate loop. The evaluate will happen after every checkpoint. Checkpoint frequency is determined based on RunConfig arguments: save_checkpoints_steps 10000 or save_checkpoints_secs None.
INFO:tensorflow:Reading data files from ende.corpus/translate_ende_wmt_bpe32k-train*
INFO:tensorflow:partition: 0 num_data_files: 100
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Setting T2TModel mode to 'train'
INFO:tensorflow:Using variable initializer: uniform_unit_scaling
INFO:tensorflow:Transforming feature 'inputs' with symbol_modality_37008_512.bottom
WARNING:tensorflow:From /users/kweonwooj/convergence_of_transformer/venv/lib/python2.7/site-packages/tensorflow/python/framework/function.py:988: calling create_op (from tensorflow.python.framework.ops) with compute_shapes is deprecated and will be removed in a future version.
Instructions for updating:
Shapes are always computed; don't use the compute_shapes as it has no effect.
INFO:tensorflow:Transforming 'targets' with symbol_modality_37008_512.targets_bottom
INFO:tensorflow:Building model body
INFO:tensorflow:Transforming body output with symbol_modality_37008_512.top
INFO:tensorflow:Transforming feature 'inputs' with symbol_modality_37008_512.bottom
INFO:tensorflow:Transforming 'targets' with symbol_modality_37008_512.targets_bottom
INFO:tensorflow:Building model body
INFO:tensorflow:Transforming body output with symbol_modality_37008_512.top
INFO:tensorflow:Transforming feature 'inputs' with symbol_modality_37008_512.bottom
INFO:tensorflow:Transforming 'targets' with symbol_modality_37008_512.targets_bottom
INFO:tensorflow:Building model body
INFO:tensorflow:Transforming body output with symbol_modality_37008_512.top
INFO:tensorflow:Transforming feature 'inputs' with symbol_modality_37008_512.bottom
INFO:tensorflow:Transforming 'targets' with symbol_modality_37008_512.targets_bottom
INFO:tensorflow:Building model body
INFO:tensorflow:Transforming body output with symbol_modality_37008_512.top
INFO:tensorflow:Transforming feature 'inputs' with symbol_modality_37008_512.bottom
INFO:tensorflow:Transforming 'targets' with symbol_modality_37008_512.targets_bottom
INFO:tensorflow:Building model body
INFO:tensorflow:Transforming body output with symbol_modality_37008_512.top
INFO:tensorflow:Transforming feature 'inputs' with symbol_modality_37008_512.bottom
INFO:tensorflow:Transforming 'targets' with symbol_modality_37008_512.targets_bottom
INFO:tensorflow:Building model body
INFO:tensorflow:Transforming body output with symbol_modality_37008_512.top
INFO:tensorflow:Transforming feature 'inputs' with symbol_modality_37008_512.bottom
INFO:tensorflow:Transforming 'targets' with symbol_modality_37008_512.targets_bottom
INFO:tensorflow:Building model body
INFO:tensorflow:Transforming body output with symbol_modality_37008_512.top
INFO:tensorflow:Transforming feature 'inputs' with symbol_modality_37008_512.bottom
INFO:tensorflow:Transforming 'targets' with symbol_modality_37008_512.targets_bottom
INFO:tensorflow:Building model body
INFO:tensorflow:Transforming body output with symbol_modality_37008_512.top
INFO:tensorflow:Base learning rate: 2.000000
INFO:tensorflow:Trainable Variables Total size: 63068160
INFO:tensorflow:Using optimizer Adam
/users/kweonwooj/convergence_of_transformer/venv/lib/python2.7/site-packages/tensorflow/python/ops/gradients_impl.py:108: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.
  "Converting sparse IndexedSlices to a dense Tensor of unknown shape. "
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Saving checkpoints for 0 into ende.univ_trans_small/model.ckpt.
INFO:tensorflow:loss = 9.615246, step = 0
INFO:tensorflow:global_step/sec: 0.613905
INFO:tensorflow:loss = 8.118498, step = 100 (162.893 sec)
INFO:tensorflow:global_step/sec: 1.46109
INFO:tensorflow:loss = 7.010751, step = 200 (68.442 sec)

Below is the result of grep bleu log.ende.train

INFO:tensorflow:Saving dict for global step 10000: global_step = 10000, loss = 2.772607, metrics-translate_ende_wmt_bpe32k/targets/approx_bleu_score = 0.1610541, metrics-translate_ende_wmt_bpe32k/targets/neg_log_perplexity = -2.7994494
INFO:tensorflow:Saving dict for global step 20000: global_step = 20000, loss = 2.4357393, metrics-translate_ende_wmt_bpe32k/targets/approx_bleu_score = 0.19893184, metrics-translate_ende_wmt_bpe32k/targets/neg_log_perplexity = -2.4517777
INFO:tensorflow:Saving dict for global step 30000: global_step = 30000, loss = 2.3373313, metrics-translate_ende_wmt_bpe32k/targets/approx_bleu_score = 0.21346238, metrics-translate_ende_wmt_bpe32k/targets/neg_log_perplexity = -2.342175
INFO:tensorflow:Saving dict for global step 40000: global_step = 40000, loss = 2.2833319, metrics-translate_ende_wmt_bpe32k/targets/approx_bleu_score = 0.221328, metrics-translate_ende_wmt_bpe32k/targets/neg_log_perplexity = -2.2856874
INFO:tensorflow:Saving dict for global step 50000: global_step = 50000, loss = 2.2407055, metrics-translate_ende_wmt_bpe32k/targets/approx_bleu_score = 0.22424038, metrics-translate_ende_wmt_bpe32k/targets/neg_log_perplexity = -2.246523
INFO:tensorflow:Saving dict for global step 60000: global_step = 60000, loss = 2.2218, metrics-translate_ende_wmt_bpe32k/targets/approx_bleu_score = 0.22700183, metrics-translate_ende_wmt_bpe32k/targets/neg_log_perplexity = -2.2215374
INFO:tensorflow:Saving dict for global step 70000: global_step = 70000, loss = 2.2043395, metrics-translate_ende_wmt_bpe32k/targets/approx_bleu_score = 0.23245056, metrics-translate_ende_wmt_bpe32k/targets/neg_log_perplexity = -2.2044787
INFO:tensorflow:Saving dict for global step 80000: global_step = 80000, loss = 2.1916614, metrics-translate_ende_wmt_bpe32k/targets/approx_bleu_score = 0.23251209, metrics-translate_ende_wmt_bpe32k/targets/neg_log_perplexity = -2.1914012
INFO:tensorflow:Saving dict for global step 90000: global_step = 90000, loss = 2.1842427, metrics-translate_ende_wmt_bpe32k/targets/approx_bleu_score = 0.23512189, metrics-translate_ende_wmt_bpe32k/targets/neg_log_perplexity = -2.181037
INFO:tensorflow:Saving dict for global step 100000: global_step = 100000, loss = 2.169881, metrics-translate_ende_wmt_bpe32k/targets/approx_bleu_score = 0.23619385, metrics-translate_ende_wmt_bpe32k/targets/neg_log_perplexity = -2.1701238
INFO:tensorflow:Saving dict for global step 100000: global_step = 100000, loss = 2.169881, metrics-translate_ende_wmt_bpe32k/targets/approx_bleu_score = 0.23619385, metrics-translate_ende_wmt_bpe32k/targets/neg_log_perplexity = -2.1701238

Training runs with no error, but the model is performing much worse than what the paper indicates. Decoding newstest2014.tok.bpe.32000.en file provided via t2t-datagen provides translation result in subword format. Restoring original segmenation via sed -r 's/(@@ )|(@@ ?$)//g' and evaluating bleu score results in bleu ~13, which is much lower than bleu 26.8 published in original paper.

t2t-bleu 
    --translation=out.ende.100k.gen
    --reference=newstest2014.tok.de
> BLEU_uncased =  13.58
> BLEU_cased =  13.23

How can we reproduce universal transformer results?

MostafaDehghani commented 6 years ago

The convergence problem of the Universal Transformer is solved in #1194. (Really sorry for the delay in fixing this issue!)

kweonwooj commented 6 years ago

@MostafaDehghani I re-trained universal transformer small on EnDe WMT14 (both data and model is from t2t) on master version (including your merge).

At 100k, loss is around 2.16 with approx bleu 0.23, but when tested on newstest2014 (bpe.tok as input and output restored via sed -r 's/(@@ )|(@@ ?$)//g' is used in t2t-bleu calculation), bleu is still ~13 (expected ~26.8 as the paper published)

INFO:tensorflow:Saving dict for global step 100000: global_step = 100000, 
loss = 2.1637561,  
metrics-translate_ende_wmt_bpe32k/targets/approx_bleu_score = 0.23884457,

I believe the above fix came from EnZh reproduction issue, but it seems there is another issue in EnDe reproduction on universal transformer! Please be help!

xu-song commented 6 years ago

screen shot 2018-11-27 at 11 22 03 am

colmantse commented 5 years ago

I notice the ende data provided in translate_ende.py has changed to wmt13. Can I confirm?