Open assij opened 4 years ago
same here, i simply changed
import tensorflow.compat.v1 as tf
same here, i simply changed
import tensorflow as tf
import tensorflow.compat.v1 as tf
I tried this but it did not fix it (same issue as OP)
@wjm41 I was wondering whether you fixed it or not. I have exactly the same issue here.
Haven't been able to fix it yet - looks like it's something to do with the save/loading of the model but I'm not experienced enough with TF to know where to look :(
@wjm41 Thanks for replying. I fixed mine by adding the following:
import tensorflow.compat.v1 as tf
tf.disable_v2_behavior()
@baojianzhou I tried adding that to both t2t-decoder
and t2t-trainer
which gives me a new error:
tensorflow.python.framework.errors_impl.NotFoundError: Restoring from checkpoint failed. This is most likely due to a Variable name or other graph key that is missing from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:
Key transformer/body/parallel_0/body/encoder/layer_0/ffn/conv1/bias not found in checkpoint
[[node save/RestoreV2_1 (defined at /lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py:630) ]]
@wjm41, I believe got the same error too, if I recall it clearly. Have you retrained your model yet?
The reason is that, if you load the checkpoint (the model trained without adding tf.disable_v2_behavior()), Tensorflow will somehow still use some V2 features. My solution is that I just retrained the model from the beginning. The decoder process can be successfully finished after using the new trained checkpoint. Hope it helps.
@baojianzhou Yes it's working now! Thanks so much :)
@baojianzhou I trained the model again with t2t-trainer having tf.disable_v2_behavior(), however the t2t-decoder still has issues. Can you please attach the files that you are using including the train command line + decoder command line.
@assij my t2t-trainer
looks like this:
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from tensor2tensor.bin import t2t_trainer
import tensorflow.compat.v1 as tf
def main(argv):
t2t_trainer.main(argv)
if __name__ == "__main__":
tf.disable_v2_behavior()
tf.logging.set_verbosity(tf.logging.INFO)
tf.app.run(main)
and my t2t-decoder
looks like this:
"""t2t-decoder."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
#import tensorflow.compat.v1 as tf
from tensor2tensor.bin import t2t_decoder
import logging
#import tensorflow as tf
import tensorflow.compat.v1 as tf
def main(argv):
t2t_decoder.main(argv)
if __name__ == "__main__":
tf.disable_v2_behavior()
tf.logging.set_verbosity(tf.logging.INFO)
tf.app.run()
@wjm41 Thanks, are you using the t2t-trainer with --optionally_use_dist_strat=True ?
@assij No I wasn't - I got it working for a transformer on a custom PROBLEM
, not sure that changing hparams should affect this problem in particular.
@wjm41 are you using t2t tag 1.15.7 as is with only the above 2 changes? are you doing training on multiple GPUs or 1 GPU? I'm working on multiple GPUs. can you please send the result of pip freeze | grep tensor
@wjm41 are you using t2t tag 1.15.7 as is with only the above 2 changes? are you doing training on multiple GPUs or 1 GPU? I'm working on multiple GPUs. can you please send the result of pip freeze | grep tensor
I've got exactly the same issue. I've tried solution mentioned above, but it's still not working... Have you fixed it?
2020-11-11 05:25:43.286162: W tensorflow/core/framework/op_kernel.cc:1767] OP_REQUIRES failed at save_restore_v2_ops.cc:184 : Not found: Key evolved_transformer/body/decoder/layer_0/first_attend_to_encoder/multihead_attention/k/kernel not found in checkpoint 2020-11-11 05:25:43.286801: W tensorflow/core/framework/op_kernel.cc:1767] OP_REQUIRES failed at save_restore_v2_ops.cc:184 : Not found: Key evolved_transformer/body/parallel_0/body/encoder/layer_0/conv_branches/dense_2/bias not found in checkpoint Traceback (most recent call last): File "/home/WwhStuGrp/WwhStu11G/anaconda3/envs/py3.7-tensorflow/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1365, in _do_call return fn(*args) File "/home/WwhStuGrp/WwhStu11G/anaconda3/envs/py3.7-tensorflow/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1350, in _run_fn target_list, run_metadata) File "/home/WwhStuGrp/WwhStu11G/anaconda3/envs/py3.7-tensorflow/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1443, in _call_tf_sessionrun run_metadata) tensorflow.python.framework.errors_impl.NotFoundError: 2 root error(s) found. (0) Not found: Key evolved_transformer/body/decoder/layer_0/first_attend_to_encoder/multihead_attention/k/kernel not found in checkpoint [[{{node save/RestoreV2}}]] (1) Not found: Key evolved_transformer/body/decoder/layer_0/first_attend_to_encoder/multihead_attention/k/kernel not found in checkpoint [[{{node save/RestoreV2}}]] [[save/RestoreV2_1/_13]] 0 successful operations. 0 derived errors ignored.
During handling of the above exception, another exception occurred:
@wjm41, I believe got the same error too, if I recall it clearly. Have you retrained your model yet?
The reason is that, if you load the checkpoint (the model trained without adding tf.disable_v2_behavior()), Tensorflow will somehow still use some V2 features. My solution is that I just retrained the model from the beginning. The decoder process can be successfully finished after using the new trained checkpoint. Hope it helps.
I retrained the model,and add tf.disable_v2_behavior()
to t2t-trainer ,t2t-decoder,t2t-translate-all,but I still have the problem :
root error(s) found. (0) Not found: Key transformer/body/decoder/layer_0/encdec_attention/multihead_attention/k/kernel not found in checkpoint [[node save/RestoreV2 (defined at /lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/estimator.py:629) ]] (1) Not found: Key transformer/body/decoder/layer_0/encdec_attention/multihead_attention/k/kernel not found in checkpoint [[node save/RestoreV2 (defined at /lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/estimator.py:629) ]] [[save/RestoreV2_1/_249]]
Do you know the reason?
2020-11-11 05:25:43.286162: W tensorflow/core/framework/op_kernel.cc:1767] OP_REQUIRES failed at save_restore_v2_ops.cc:184 : Not found: Key evolved_transformer/body/decoder/layer_0/first_attend_to_encoder/multihead_attention/k/kernel not found in checkpoint 2020-11-11 05:25:43.286801: W tensorflow/core/framework/op_kernel.cc:1767] OP_REQUIRES failed at save_restore_v2_ops.cc:184 : Not found: Key evolved_transformer/body/parallel_0/body/encoder/layer_0/conv_branches/dense_2/bias not found in checkpoint Traceback (most recent call last): File "/home/WwhStuGrp/WwhStu11G/anaconda3/envs/py3.7-tensorflow/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1365, in _do_call return fn(*args) File "/home/WwhStuGrp/WwhStu11G/anaconda3/envs/py3.7-tensorflow/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1350, in _run_fn target_list, run_metadata) File "/home/WwhStuGrp/WwhStu11G/anaconda3/envs/py3.7-tensorflow/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1443, in _call_tf_sessionrun run_metadata) tensorflow.python.framework.errors_impl.NotFoundError: 2 root error(s) found. (0) Not found: Key evolved_transformer/body/decoder/layer_0/first_attend_to_encoder/multihead_attention/k/kernel not found in checkpoint [[{{node save/RestoreV2}}]] (1) Not found: Key evolved_transformer/body/decoder/layer_0/first_attend_to_encoder/multihead_attention/k/kernel not found in checkpoint [[{{node save/RestoreV2}}]] [[save/RestoreV2_1/_13]] 0 successful operations. 0 derived errors ignored.
During handling of the above exception, another exception occurred:
You should install tensor2tensor from github like as below:
git clone https://github.com/tensorflow/tensor2tensor.git
cd tensor2tensor
pip install .
then replace t2t-trainer and t2t-decoder to https://github.com/tensorflow/tensor2tensor/issues/1849#issuecomment-701491229
2020-11-11 05:25:43.286162: W tensorflow/core/framework/op_kernel.cc:1767] OP_REQUIRES failed at save_restore_v2_ops.cc:184 : Not found: Key evolved_transformer/body/decoder/layer_0/first_attend_to_encoder/multihead_attention/k/kernel not found in checkpoint 2020-11-11 05:25:43.286801: W tensorflow/core/framework/op_kernel.cc:1767] OP_REQUIRES failed at save_restore_v2_ops.cc:184 : Not found: Key evolved_transformer/body/parallel_0/body/encoder/layer_0/conv_branches/dense_2/bias not found in checkpoint Traceback (most recent call last): File "/home/WwhStuGrp/WwhStu11G/anaconda3/envs/py3.7-tensorflow/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1365, in _do_call return fn(*args) File "/home/WwhStuGrp/WwhStu11G/anaconda3/envs/py3.7-tensorflow/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1350, in _run_fn target_list, run_metadata) File "/home/WwhStuGrp/WwhStu11G/anaconda3/envs/py3.7-tensorflow/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1443, in _call_tf_sessionrun run_metadata) tensorflow.python.framework.errors_impl.NotFoundError: 2 root error(s) found. (0) Not found: Key evolved_transformer/body/decoder/layer_0/first_attend_to_encoder/multihead_attention/k/kernel not found in checkpoint [[{{node save/RestoreV2}}]] (1) Not found: Key evolved_transformer/body/decoder/layer_0/first_attend_to_encoder/multihead_attention/k/kernel not found in checkpoint [[{{node save/RestoreV2}}]] [[save/RestoreV2_1/_13]] 0 successful operations. 0 derived errors ignored. During handling of the above exception, another exception occurred:
You should install tensor2tensor from github like as below:
git clone https://github.com/tensorflow/tensor2tensor.git cd tensor2tensor pip install .
then replace t2t-trainer and t2t-decoder to #1849 (comment)
Yes,it's working now!thanks very much!
Description
When running t2t-decoder script ( En-De transformer-big) on a model which was trained on 8 GPUs using DistributedMirrorStrategy.
I get the following error ValueError: Tensor("body/parallel_0/body/decoder/layer_0/self_attention/multihead_attention/dot_product_attention/attention:0", shape=(), dtype=string, device=/device:GPU:0) must be from the same graph as Tensor("transformer_hparams:0", shape=(), dtype=string). ...
Environment information
For bugs: reproduction and error logs