Filed to run GNMT - Githubissues

skyw commented 7 years ago

It complains a key error "KeyError: num_residual_layers"

Here is my script

python -m nmt.nmt \ --src=en --tgt=de \ --vocab_prefix=${DATA_DIR}/vocab \ --train_prefix=${DATA_DIR}/train \ --dev_prefix=${DATA_DIR}/newstest2014 \ --test_prefix=${DATA_DIR}/newstest2015 \ --out_dir=$(OUT_DIR}/test \ --hparams_path nmt/standard_hparams/wmt16_en_de_gnmt.json

oahziur commented 7 years ago

Thanks, I will need to update the nmt/standard_hparams/wmt16_en_de_gnmt.json.

oahziur commented 7 years ago

I am also adding instructions on how to train and load the gnmt model from scratch.

vince62s commented 7 years ago

I ran a standard attention / scaled_luong / uni system and go the expected results. Same with gnmt architecture / scaled_luong / enc_type gnmt, completely off. Is there something special to do for GNMT attention architecture ?

oahziur commented 7 years ago

@vince62s Did you check with the standard_hparams for GNMT, there are also pre-trained models available for download in the README page.

ndvbd commented 6 years ago

Same problem here. After training, when doing inference I get:

KeyError: 'num_encoder_residual_layers'

It only works when I delete all these keys from the hparams file, and when I set the --hparams_path to the directory of the best_bleu, but then after one run, for some reason, it rewrites the hparams file, and add these problematic key/values again... It's not clear how this mechanism works.

My guess is that when the code is saving hparams, it simply writes key values that it doesn't suppose to.

oahziur commented 6 years ago

@nadavb can you share the command getting the error? were you using the standard_hparams file in the repo for inference?

There are some updates to the hparams recently, so I think the standard_hparams maybe out of date.

ndvbd commented 6 years ago

@oahziur I did not use the standard hparams. I used the params as shown in the tutorial.

So for training:

--attention=scaled_luong \
    --src=vi --tgt=en \
    --vocab_prefix=tmp/nmt_data/vocab  \
    --train_prefix=tmp/nmt_data/train \
    --dev_prefix=tmp/nmt_data/tst2012  \
    --test_prefix=tmp/nmt_data/tst2013 \
    --out_dir=/tmp/nmt_attention_model \
    --num_train_steps=5000 \
    --steps_per_stats=20 \
    --num_layers=2 \
    --num_units=128 \
    --dropout=0.2 \
    --metrics=bleu

And for inference:

python nmt/nmt.py \
    --out_dir=/tmp/nmt_attention_model \
    --inference_input_file=/tmp/nmt_data/source_infer.vi \
    --inference_output_file=/tmp/nmt_attention_model/output_infer

LimWoohyun commented 6 years ago

@nadavb Hello. i"m studying nmt. i want to run test file. so i ran just. nmt,py but failed.

how to command your script?? please let me know basiclly

ndvbd commented 6 years ago

@LimWoohyun Look at https://github.com/tensorflow/nmt -> search for "Hands-on – building an attention-based NMT model" the command is written there.

bquast commented 6 years ago

@oahziur I get Key error using the standard_hparams (tf 1.6rc1, will try on my other machine with tf1.5-cuda).

NotFoundError (see above for traceback): Key dynamic_seq2seq/encoder/rnn/basic_lstm_cell/bias not found in checkpoint
     [[Node: save/RestoreV2 = RestoreV2[dtypes=[DT_INT32, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, ..., DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2/tensor_names, save/RestoreV2/shape_and_slices)]]

using the command:

[bquast@UX370UA ~]$ cd nmt
[bquast@UX370UA nmt]$ python -m nmt.nmt \
>     --src=de --tgt=en \
>     --ckpt=deen_gnmt_model_4_layer/translate.ckpt \
>     --hparams_path=nmt/standard_hparams/wmt16_gnmt_4_layer.json \
>     --out_dir=/tmp/deen_gnmt \
>     --vocab_prefix=/home/bquast/en_de_data/vocab.bpe.32000 \
>     --inference_input_file=/home/bquast/en_de_data/newstest2014.tok.bpe.32000.de \
>     --inference_output_file=/home/bquast/deen_gnmt_model_4_layer/output_infer \

full output here:

https://gist.github.com/bquast/30ba7630d2bf32b59dd8349889fc7638

EDIT: confirmed, same error on tf15.-cuda

https://gist.github.com/bquast/0ddbf8eda363d312dd57b51aebb11f5d

tiberiu92 commented 6 years ago

@bquast I recently got the error using the same configuration.

Key dynamic_seq2seq/encoder/rnn/basic_lstm_cell/bias not found in checkpoint [[Node: save/RestoreV2 = RestoreV2[dtypes=[DT_INT32, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, ..., DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2/tensor_names, save/RestoreV2/shape_and_slices)]]

I tried this with tf14 too, but no luck. Are there any updates on this?

Thank you.

bquast commented 6 years ago

hey, no news yet, any progress on your side?

oahziur commented 6 years ago

@bquast I think this is related to https://github.com/tensorflow/nmt/issues/264 and there is a PR fixed this https://github.com/tensorflow/nmt/pull/265. Maybe you can try patch the PR and see if you still get the issue. Make sure you clear the model directory.

xiaohaoliang commented 6 years ago

@bquast @tiberiu92 @oahziur
I got the same error using the same configuration. （tf-1.8, python-2.7）

python -m nmt.nmt \
    --src=de --tgt=en \
    --ckpt=/home/xiaohao/nmt/models/deen_gnmt_model_4_layer/translate.ckpt \
    --hparams_path=nmt/standard_hparams/wmt16_gnmt_4_layer.json \
    --out_dir=/home/xiaohao/data/deen_gnmt \
    --vocab_prefix=/home/xiaohao/data/wmt16/vocab.bpe.32000 \
    --inference_input_file=/home/xiaohao/data/wmt16/newstest2015.tok.bpe.32000.de \
    --inference_output_file=/home/xiaohao/data/deen_gnmt/output_infer \
    --inference_ref_file=/home/xiaohao/data/wmt16/newstest2015.tok.bpe.32000.en

NotFoundError (see above for traceback): Key dynamic_seq2seq/encoder/rnn/basic_lstm_cell/bias not found in checkpoint

I print keys of deen_gnmt_model_4_layer/translate.ckpt ,not find .../rnn/basic_lstm_cell/bias

xiaohao@ubuntu:~/nmt$ python ckpt_print.py models/deen_gnmt_model_4_layer/translate.ckpt
('CHECKPOINT_FILE: ', 'models/deen_gnmt_model_4_layer/translate.ckpt')
('tensor_name: ', 'embeddings/encoder/embedding_encoder')
('tensor_name: ', 'dynamic_seq2seq/decoder/memory_layer/kernel')
('tensor_name: ', 'dynamic_seq2seq/decoder/multi_rnn_cell/cell_3/basic_lstm_cell/kernel')
('tensor_name: ', 'dynamic_seq2seq/encoder/bidirectional_rnn/fw/basic_lstm_cell/kernel')
('tensor_name: ', 'dynamic_seq2seq/decoder/multi_rnn_cell/cell_3/basic_lstm_cell/bias')
('tensor_name: ', 'dynamic_seq2seq/decoder/output_projection/kernel')
('tensor_name: ', 'dynamic_seq2seq/decoder/multi_rnn_cell/cell_0_attention/attention/bahdanau_attention/query_layer/kernel')
('tensor_name: ', 'dynamic_seq2seq/decoder/multi_rnn_cell/cell_0_attention/attention/basic_lstm_cell/kernel')
('tensor_name: ', 'dynamic_seq2seq/encoder/rnn/multi_rnn_cell/cell_0/basic_lstm_cell/kernel')
('tensor_name: ', 'dynamic_seq2seq/decoder/multi_rnn_cell/cell_1/basic_lstm_cell/kernel')
('tensor_name: ', 'dynamic_seq2seq/decoder/multi_rnn_cell/cell_0_attention/attention/bahdanau_attention/attention_v')
('tensor_name: ', 'dynamic_seq2seq/encoder/rnn/multi_rnn_cell/cell_0/basic_lstm_cell/bias')
('tensor_name: ', 'dynamic_seq2seq/decoder/multi_rnn_cell/cell_0_attention/attention/bahdanau_attention/attention_b')
('tensor_name: ', 'dynamic_seq2seq/decoder/multi_rnn_cell/cell_0_attention/attention/bahdanau_attention/attention_g')
('tensor_name: ', 'dynamic_seq2seq/decoder/multi_rnn_cell/cell_1/basic_lstm_cell/bias')
('tensor_name: ', 'Variable')
('tensor_name: ', 'dynamic_seq2seq/decoder/multi_rnn_cell/cell_0_attention/attention/basic_lstm_cell/bias')
('tensor_name: ', 'embeddings/decoder/embedding_decoder')
('tensor_name: ', 'dynamic_seq2seq/decoder/multi_rnn_cell/cell_2/basic_lstm_cell/bias')
('tensor_name: ', 'dynamic_seq2seq/encoder/rnn/multi_rnn_cell/cell_1/basic_lstm_cell/bias')
('tensor_name: ', 'dynamic_seq2seq/encoder/bidirectional_rnn/bw/basic_lstm_cell/kernel')
('tensor_name: ', 'dynamic_seq2seq/encoder/bidirectional_rnn/bw/basic_lstm_cell/bias')
('tensor_name: ', 'dynamic_seq2seq/encoder/rnn/multi_rnn_cell/cell_2/basic_lstm_cell/kernel')
('tensor_name: ', 'dynamic_seq2seq/encoder/bidirectional_rnn/fw/basic_lstm_cell/bias')
('tensor_name: ', 'dynamic_seq2seq/encoder/rnn/multi_rnn_cell/cell_2/basic_lstm_cell/bias')
('tensor_name: ', 'dynamic_seq2seq/decoder/multi_rnn_cell/cell_2/basic_lstm_cell/kernel')
('tensor_name: ', 'dynamic_seq2seq/encoder/rnn/multi_rnn_cell/cell_1/basic_lstm_cell/kernel')
xiaohao@ubuntu:~/nmt$

I try the PR(#265), and rm -rf /home/xiaohao/data/deen_gnmt/* . The problem is sloved！

tks~ @oahziur

tensorflow / nmt

Filed to run GNMT #5