InvalidArgumentError: Mismatch between Graphs

donaparker01 commented 5 years ago

I have tried the README exactly for English to German 8 layer inference with the current setup Ubuntu 16.04 Tensorflow 1.9

but I receive the following error:

InvalidArgumentError (see above for traceback): Restoring from checkpoint failed. This is most likely due to a mismatch between the current graph and the graph from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:

Assign requires shapes of both tensors to match. lhs shape= [36549,1024] rhs shape= [36548,1024] [[Node: save/Assign_26 = Assign[T=DT_FLOAT, _class= ["loc:@embeddings/encoder/embedding_encoder"], use_locking=true, validate_shape=true, _device="/job:localhost/replica:0/task:0/device:CPU:0"] (embeddings/encoder/embedding_encoder, save/RestoreV2:26)]]

Can someone advise me on how to resolve the issue?

qwerybot commented 5 years ago

I'm getting the same error with a similar setup, I'm using tensorflow 1.12. Did you ever figure out a solution?

heyuanw commented 5 years ago

I'm getting the same error with a similar setup, I'm using tensorflow-night. Did you ever figure out a solution?

guyco87 commented 5 years ago

I found the problem, although I'm not sure how exactly to solve it. There's a mismatch between the checkpoint graph's embedding layer size and the vocabulary size (both src and target). In wmt16/vocab.bpe.32000 file there are 36549 lines while the saved models' input layer size is [36548, num_units].

I know it's not a solution, but I added the following lines in line 503 in nmt/nmt.py and I was able to run the model: src_vocab_size -= 1 tgt_vocab_size -= 1

I guess the problem is in the script nmt/scripts/wmt16_en_de.sh where wmt16/vocab.bpe.32000 is being created.

christ1ne commented 5 years ago

I end up just traing my own GNMT model. Afterwards, you can run the inference code with your own model without issue. I suspect the vocab file generation code for the pretrained model is somehow different.

qwerybot commented 5 years ago

I used the same solution as christ1ne. I tried a similar solution to guyco87, but my resulting BLEU scores were very low, e.g. 6.32, which indicates that many of the lines of the vocab file have moved around as well as something being added (or taken away too).

It seems the generation of the vocab file is either non-deterministic, or has changed since the model was pretrained. If we were able to find a copy of the vocab file which was used to train the models then it would work. Unfortunately I had no look tracking one down.

SergeCraft commented 5 years ago

I got the same problem when tried to export inference graph with tensorflow object detection API, but i have solved it. I has mistaken with choose of pipeline.config file to export. In my case, it should be exactly the same with pipeline.config file that was used to train model.

nithya4 commented 5 years ago

@BenTaylor3115 Using the current scripts to generate the vocab, I get 36549 lines in my vocab.bpe.32000 file.

Using the BPE file linked in issue #85, I ran the BPE portion and it generated 37008 lines. It indeed looks like the generation has changed since the model was trained. But the model expects a tensor of dims [36548, 1024] - not sure where this is from.

chih-hong commented 5 years ago

I got the same problem. how to solve it. Assign requires shapes of both tensors to match. lhs shape= [1024,2048] rhs shape= [2048] [[node save/Assign_381 (defined at /home/sca_test/bazel-bin/im2txt/run_inference.runfiles/main/im2txt/inference_utils/inference_wrapper_base.py:116) = Assign[T=DT_FLOAT, _class=["loc:@lstm/basic_lstm_cell/kernel"], use_locking=true, validate_shape=true, _device="/job:localhost/replica:0/task:0/device:CPU:0"](lstm/basic_lstm_cell/kernel, save/RestoreV2:381)]]