I am trying to follow the approach as mentioned in paper: block parallel decoding for deep autoregressive models. It states that firstly the model is trained on transformer model for a task using hparam : transformer_base and on top of this transformer_block_parallel is trained . I am not able to load the checkpoint created after training using transformer, to train on transformer_block_parallel.
...
Environment information
OS: <your answer here>
$ pip freeze | grep tensor
# your output here
$ python -V
# your output here
For bugs: reproduction and error logs
# Steps to reproduce:
...
# Error logs:
2019-01-28 19:00:29.216679: W tensorflow/core/framework/op_kernel.cc:1273] OP_REQUIRES failed at save_restore_v2_ops.cc:184 : Not found: Key transformer_block_parallel/body/block_size_2/conv1/bias not found in checkpoint
Traceback (most recent call last):
File "/home/rasna_goyal66/.local/bin/t2t-trainer", line 33, in <module>
tf.app.run()
File "/home/rasna_goyal66/.local/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 125, in run
_sys.exit(main(argv))
File "/home/rasna_goyal66/.local/bin/t2t-trainer", line 28, in main
t2t_trainer.main(argv)
File "/home/rasna_goyal66/.local/lib/python2.7/site-packages/tensor2tensor/bin/t2t_trainer.py", line 393, in main
execute_schedule(exp)
File "/home/rasna_goyal66/.local/lib/python2.7/site-packages/tensor2tensor/bin/t2t_trainer.py", line 349, in execute_schedule
getattr(exp, FLAGS.schedule)()
File "/home/rasna_goyal66/.local/lib/python2.7/site-packages/tensor2tensor/utils/trainer_lib.py", line 439, in continuous_train_and_eval
return self.evaluate()
File "/home/rasna_goyal66/.local/lib/python2.7/site-packages/tensor2tensor/utils/trainer_lib.py", line 514, in evaluate
name=name)
File "/home/rasna_goyal66/.local/lib/python2.7/site-packages/tensorflow/python/estimator/estimator.py", line 478, in evaluate
return _evaluate()
File "/home/rasna_goyal66/.local/lib/python2.7/site-packages/tensorflow/python/estimator/estimator.py", line 467, in _evaluate
output_dir=self.eval_dir(name))
File "/home/rasna_goyal66/.local/lib/python2.7/site-packages/tensorflow/python/estimator/estimator.py", line 1591, in _evaluate_run
config=self._session_config)
File "/home/rasna_goyal66/.local/lib/python2.7/site-packages/tensorflow/python/training/evaluation.py", line 271, in _evaluate_once
session_creator=session_creator, hooks=hooks) as session:
File "/home/rasna_goyal66/.local/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py", line 921, in __init__
stop_grace_period_secs=stop_grace_period_secs)
File "/home/rasna_goyal66/.local/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py", line 643, in __init__
self._sess = _RecoverableSession(self._coordinated_creator)
File "/home/rasna_goyal66/.local/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py", line 1107, in __init__
_WrappedSession.__init__(self, self._create_session())
File "/home/rasna_goyal66/.local/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py", line 1112, in _create_session
return self._sess_creator.create_session()
File "/home/rasna_goyal66/.local/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py", line 800, in create_session
self.tf_sess = self._session_creator.create_session()
File "/home/rasna_goyal66/.local/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py", line 566, in create_session
init_fn=self._scaffold.init_fn)
File "/home/rasna_goyal66/.local/lib/python2.7/site-packages/tensorflow/python/training/session_manager.py", line 288, in prepare_session
config=config)
File "/home/rasna_goyal66/.local/lib/python2.7/site-packages/tensorflow/python/training/session_manager.py", line 202, in _restore_checkpoint
saver.restore(sess, checkpoint_filename_with_path)
File "/home/rasna_goyal66/.local/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 1562, in restore
err, "a Variable name or other graph key that is missing")
tensorflow.python.framework.errors_impl.NotFoundError: Restoring from checkpoint failed. This is most likely due to a Variable name or other graph key that is missing from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:
Key transformer_block_parallel/body/block_size_2/conv1/bias not found in checkpoint
[[node save/RestoreV2_1 (defined at /home/rasna_goyal66/.local/lib/python2.7/site-packages/tensor2tensor/utils/trainer_lib.py:514) = RestoreV2[dtypes=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, ..., DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2_1/tensor_names, save/RestoreV2_1/shape_and_slices)]]
Caused by op u'save/RestoreV2_1', defined at:
File "/home/rasna_goyal66/.local/bin/t2t-trainer", line 33, in <module>
tf.app.run()
File "/home/rasna_goyal66/.local/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 125, in run
_sys.exit(main(argv))
File "/home/rasna_goyal66/.local/bin/t2t-trainer", line 28, in main
t2t_trainer.main(argv)
File "/home/rasna_goyal66/.local/lib/python2.7/site-packages/tensor2tensor/bin/t2t_trainer.py", line 393, in main
execute_schedule(exp)
File "/home/rasna_goyal66/.local/lib/python2.7/site-packages/tensor2tensor/bin/t2t_trainer.py", line 349, in execute_schedule
getattr(exp, FLAGS.schedule)()
File "/home/rasna_goyal66/.local/lib/python2.7/site-packages/tensor2tensor/utils/trainer_lib.py", line 439, in continuous_train_and_eval
return self.evaluate()
File "/home/rasna_goyal66/.local/lib/python2.7/site-packages/tensor2tensor/utils/trainer_lib.py", line 514, in evaluate
name=name)
File "/home/rasna_goyal66/.local/lib/python2.7/site-packages/tensorflow/python/estimator/estimator.py", line 478, in evaluate
return _evaluate()
File "/home/rasna_goyal66/.local/lib/python2.7/site-packages/tensorflow/python/estimator/estimator.py", line 467, in _evaluate
output_dir=self.eval_dir(name))
File "/home/rasna_goyal66/.local/lib/python2.7/site-packages/tensorflow/python/estimator/estimator.py", line 1591, in _evaluate_run
config=self._session_config)
File "/home/rasna_goyal66/.local/lib/python2.7/site-packages/tensorflow/python/training/evaluation.py", line 271, in _evaluate_once
session_creator=session_creator, hooks=hooks) as session:
File "/home/rasna_goyal66/.local/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py", line 921, in __init__
stop_grace_period_secs=stop_grace_period_secs)
File "/home/rasna_goyal66/.local/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py", line 643, in __init__
self._sess = _RecoverableSession(self._coordinated_creator)
File "/home/rasna_goyal66/.local/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py", line 1107, in __init__
_WrappedSession.__init__(self, self._create_session())
File "/home/rasna_goyal66/.local/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py", line 1112, in _create_session
return self._sess_creator.create_session()
File "/home/rasna_goyal66/.local/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py", line 800, in create_session
self.tf_sess = self._session_creator.create_session()
File "/home/rasna_goyal66/.local/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py", line 557, in create_session
self._scaffold.finalize()
File "/home/rasna_goyal66/.local/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py", line 213, in finalize
self._saver = training_saver._get_saver_or_default() # pylint: disable=protected-access
File "/home/rasna_goyal66/.local/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 886, in _get_saver_or_default
saver = Saver(sharded=True, allow_empty=True)
File "/home/rasna_goyal66/.local/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 1102, in __init__
self.build()
File "/home/rasna_goyal66/.local/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 1114, in build
self._build(self._filename, build_save=True, build_restore=True)
File "/home/rasna_goyal66/.local/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 1151, in _build
build_save=build_save, build_restore=build_restore)
File "/home/rasna_goyal66/.local/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 789, in _build_internal
restore_sequentially, reshape)
File "/home/rasna_goyal66/.local/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 459, in _AddShardedRestoreOps
name="restore_shard"))
File "/home/rasna_goyal66/.local/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 406, in _AddRestoreOps
restore_sequentially)
File "/home/rasna_goyal66/.local/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 862, in bulk_restore
return io_ops.restore_v2(filename_tensor, names, slices, dtypes)
File "/home/rasna_goyal66/.local/lib/python2.7/site-packages/tensorflow/python/ops/gen_io_ops.py", line 1466, in restore_v2
shape_and_slices=shape_and_slices, dtypes=dtypes, name=name)
File "/home/rasna_goyal66/.local/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "/home/rasna_goyal66/.local/lib/python2.7/site-packages/tensorflow/python/util/deprecation.py", line 488, in new_func
return func(*args, **kwargs)
File "/home/rasna_goyal66/.local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 3274, in create_op
op_def=op_def)
File "/home/rasna_goyal66/.local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1770, in __init__
self._traceback = tf_stack.extract_stack()
NotFoundError (see above for traceback): Restoring from checkpoint failed. This is most likely due to a Variable name or other graph key that is missing from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:
Key transformer_block_parallel/body/block_size_2/conv1/bias not found in checkpoint
[[node save/RestoreV2_1 (defined at /home/rasna_goyal66/.local/lib/python2.7/site-packages/tensor2tensor/utils/trainer_lib.py:514) = RestoreV2[dtypes=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, ..., DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2_1/tensor_names, save/RestoreV2_1/shape_and_slices)]]
Description
I am trying to follow the approach as mentioned in paper: block parallel decoding for deep autoregressive models. It states that firstly the model is trained on transformer model for a task using hparam : transformer_base and on top of this transformer_block_parallel is trained . I am not able to load the checkpoint created after training using transformer, to train on transformer_block_parallel.
...
Environment information
For bugs: reproduction and error logs