Closed khaidoan25 closed 5 years ago
Hi @khaidoan25,
thanks for reporting it! I'm currenly training a new model with the latest `tensor2tensor" version on TPU. I'll report back whenever the model is trained and the evaluation looks good :)
thanks for your responding. Please inform me when you finish the training, thank you :+1:
Training has finished -> I would be great if you can check if you can load the new averaged checkpoint.
Please make sure, that you have the lastest version of tensor2tensor
installed (I used latest master
branch, that requires TensorFlow 1.14):
$ wget https://schweter.eu/cloud/nmt-en-vi/envi-model.avg-250000.tar.xz
tar -xJf envi-model.avg-250000.tar.xz
that downloads the new checkpoint. Then:
$ wget "https://github.com/stefan-it/nmt-en-vi/raw/master/data/test-2013-en-vi.tgz"
$ tar -xzf test-2013-en-vi.tgz
downloads the test data for En -> Vi.
Translate the test file:
$ t2t-decoder --data_dir=t2t_data --problem=translate_envi_iwslt32k \
--model=transformer --decode_hparams="beam_size=4,alpha=0.6" \
--decode_from_file=tst2013.en --decode_to_file=system.output \
--hparams_set=transformer_base \
--checkpoint_path t2t_export/model.ckpt-250000
Result is written to system.output
. After that calculate BLEU score with:
$ t2t-bleu --translation=system.output --reference=tst2013.vi
It then should output the following scores:
BLEU_uncased = 29.44
BLEU_cased = 28.54
Score are higher than the current reported ones with the old checkpoint :fireworks:
Please let me know if this works for you! Then I would update the main readme file!
Btw: I'm currently training a Big Transformer model for further research (update: Big Transformer achieved a BLEU score of only 6.73) :)
It's worked. Thank you very much :)
Hi @stefan-it,
I followed the instruction to use pretrained model and I encountered the similar issue, with tensor2tensor==1.14.1
and tensorflow==1.14.0
. Can you help check?
Thank you!
2019-10-17 09:59:16.555416: W tensorflow/core/framework/op_kernel.cc:1502] OP_REQUIRES failed at save_restore_v2_ops.cc:184 : Not found: Key transformer/symbol_modality_21222_512/shared/weights_0 not found in checkpoint
Traceback (most recent call last):
File "/home/tracy/miniconda3/envs/translate_envi/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1356, in _do_call
return fn(*args)
File "/home/tracy/miniconda3/envs/translate_envi/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1341, in _run_fn
options, feed_dict, fetch_list, target_list, run_metadata)
File "/home/tracy/miniconda3/envs/translate_envi/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1429, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.NotFoundError: Key transformer/symbol_modality_21222_512/shared/weights_0 not found in checkpoint
[[{{node save/RestoreV2_1}}]]
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/tracy/miniconda3/envs/translate_envi/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1286, in restore
{self.saver_def.filename_tensor_name: save_path})
File "/home/tracy/miniconda3/envs/translate_envi/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 950, in run
run_metadata_ptr)
File "/home/tracy/miniconda3/envs/translate_envi/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1173, in _run
feed_dict_tensor, options, run_metadata)
File "/home/tracy/miniconda3/envs/translate_envi/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1350, in _do_run
run_metadata)
File "/home/tracy/miniconda3/envs/translate_envi/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1370, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.NotFoundError: Key transformer/symbol_modality_21222_512/shared/weights_0 not found in checkpoint
[[node save/RestoreV2_1 (defined at /lib/python3.6/site-packages/tensor2tensor/utils/decoding.py:468) ]]```
When I download your pretrained model and run decoding, it show the bellow error. Perhaps there is something wrong in the checkpoint. Could you please check it for me ? Thank you!
2019-06-21 16:51:15.986698: W tensorflow/compiler/jit/mark_for_compilation_pass.cc:1412] (One-time warning): Not using XLA:CPU for cluster because envvar TF_XLA_FLAGS=--tf_xla_cpu_global_jit was not set. If you want XLA:CPU, either set that envvar, or use experimental_jit_scope to enable XLA:CPU. To confirm that XLA is active, pass --vmodule=xla_compilation_cache=1 (as a proper command-line flag, not via TF_XLA_FLAGS) or set the envvar XLA_FLAGS=--xla_hlo_profile. 2019-06-21 16:51:16.243972: W tensorflow/core/framework/op_kernel.cc:1502] OP_REQUIRES failed at save_restore_v2_ops.cc:184 : Not found: Key transformer/symbol_modality_20428_512/shared/weights_0 not found in checkpoint Traceback (most recent call last): File "/home/khaidoan25/miniconda3/envs/nmt/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1356, in _do_call return fn(*args) File "/home/khaidoan25/miniconda3/envs/nmt/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1341, in _run_fn options, feed_dict, fetch_list, target_list, run_metadata) File "/home/khaidoan25/miniconda3/envs/nmt/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1429, in _call_tf_sessionrun run_metadata) tensorflow.python.framework.errors_impl.NotFoundError: Key transformer/symbol_modality_20428_512/shared/weights_0 not found in checkpoint