stefan-it / nmt-en-vi

Neural Machine Translation system for English to Vietnamese (IWSLT'15 English-Vietnamese data)
57 stars 14 forks source link

Error in the pretrained model #5

Closed khaidoan25 closed 5 years ago

khaidoan25 commented 5 years ago

When I download your pretrained model and run decoding, it show the bellow error. Perhaps there is something wrong in the checkpoint. Could you please check it for me ? Thank you!

2019-06-21 16:51:15.986698: W tensorflow/compiler/jit/mark_for_compilation_pass.cc:1412] (One-time warning): Not using XLA:CPU for cluster because envvar TF_XLA_FLAGS=--tf_xla_cpu_global_jit was not set. If you want XLA:CPU, either set that envvar, or use experimental_jit_scope to enable XLA:CPU. To confirm that XLA is active, pass --vmodule=xla_compilation_cache=1 (as a proper command-line flag, not via TF_XLA_FLAGS) or set the envvar XLA_FLAGS=--xla_hlo_profile. 2019-06-21 16:51:16.243972: W tensorflow/core/framework/op_kernel.cc:1502] OP_REQUIRES failed at save_restore_v2_ops.cc:184 : Not found: Key transformer/symbol_modality_20428_512/shared/weights_0 not found in checkpoint Traceback (most recent call last): File "/home/khaidoan25/miniconda3/envs/nmt/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1356, in _do_call return fn(*args) File "/home/khaidoan25/miniconda3/envs/nmt/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1341, in _run_fn options, feed_dict, fetch_list, target_list, run_metadata) File "/home/khaidoan25/miniconda3/envs/nmt/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1429, in _call_tf_sessionrun run_metadata) tensorflow.python.framework.errors_impl.NotFoundError: Key transformer/symbol_modality_20428_512/shared/weights_0 not found in checkpoint

stefan-it commented 5 years ago

Hi @khaidoan25,

thanks for reporting it! I'm currenly training a new model with the latest `tensor2tensor" version on TPU. I'll report back whenever the model is trained and the evaluation looks good :)

khaidoan25 commented 5 years ago

thanks for your responding. Please inform me when you finish the training, thank you :+1:

stefan-it commented 5 years ago

Training has finished -> I would be great if you can check if you can load the new averaged checkpoint.

Please make sure, that you have the lastest version of tensor2tensor installed (I used latest master branch, that requires TensorFlow 1.14):

$ wget https://schweter.eu/cloud/nmt-en-vi/envi-model.avg-250000.tar.xz
tar -xJf envi-model.avg-250000.tar.xz

that downloads the new checkpoint. Then:

$ wget "https://github.com/stefan-it/nmt-en-vi/raw/master/data/test-2013-en-vi.tgz"
$ tar -xzf test-2013-en-vi.tgz

downloads the test data for En -> Vi.

Translate the test file:

$ t2t-decoder --data_dir=t2t_data --problem=translate_envi_iwslt32k \
--model=transformer --decode_hparams="beam_size=4,alpha=0.6" \
--decode_from_file=tst2013.en --decode_to_file=system.output \
--hparams_set=transformer_base \
--checkpoint_path t2t_export/model.ckpt-250000

Result is written to system.output. After that calculate BLEU score with:

$ t2t-bleu --translation=system.output --reference=tst2013.vi

It then should output the following scores:

BLEU_uncased =  29.44
BLEU_cased =  28.54

Score are higher than the current reported ones with the old checkpoint :fireworks:

Please let me know if this works for you! Then I would update the main readme file!

Btw: I'm currently training a Big Transformer model for further research (update: Big Transformer achieved a BLEU score of only 6.73) :)

khaidoan25 commented 5 years ago

It's worked. Thank you very much :)

tracyphamcse commented 4 years ago

Hi @stefan-it, I followed the instruction to use pretrained model and I encountered the similar issue, with tensor2tensor==1.14.1 and tensorflow==1.14.0. Can you help check? Thank you!


2019-10-17 09:59:16.555416: W tensorflow/core/framework/op_kernel.cc:1502] OP_REQUIRES failed at save_restore_v2_ops.cc:184 : Not found: Key transformer/symbol_modality_21222_512/shared/weights_0 not found in checkpoint
Traceback (most recent call last):
  File "/home/tracy/miniconda3/envs/translate_envi/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1356, in _do_call
    return fn(*args)
  File "/home/tracy/miniconda3/envs/translate_envi/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1341, in _run_fn
    options, feed_dict, fetch_list, target_list, run_metadata)
  File "/home/tracy/miniconda3/envs/translate_envi/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1429, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.NotFoundError: Key transformer/symbol_modality_21222_512/shared/weights_0 not found in checkpoint
     [[{{node save/RestoreV2_1}}]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/tracy/miniconda3/envs/translate_envi/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1286, in restore
    {self.saver_def.filename_tensor_name: save_path})
  File "/home/tracy/miniconda3/envs/translate_envi/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 950, in run
    run_metadata_ptr)
  File "/home/tracy/miniconda3/envs/translate_envi/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1173, in _run
    feed_dict_tensor, options, run_metadata)
  File "/home/tracy/miniconda3/envs/translate_envi/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1350, in _do_run
    run_metadata)
  File "/home/tracy/miniconda3/envs/translate_envi/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1370, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.NotFoundError: Key transformer/symbol_modality_21222_512/shared/weights_0 not found in checkpoint
     [[node save/RestoreV2_1 (defined at /lib/python3.6/site-packages/tensor2tensor/utils/decoding.py:468) ]]```