tensorflow / models

Models and examples built with TensorFlow
Other
77.05k stars 45.77k forks source link

how do u solve the problem when load checkpoint in transfer learning ssd mobile v2 , #4407

Closed pageedward closed 4 years ago

pageedward commented 6 years ago

Please go to Stack Overflow for help and support:

http://stackoverflow.com/questions/tagged/tensorflow

Also, please understand that many of the models included in this repository are experimental and research-style code. If you open a GitHub issue, here is our policy:

  1. It must be a bug, a feature request, or a significant problem with documentation (for small docs fixes please send a PR instead).
  2. The form below must be filled out.

Here's why we have that policy: TensorFlow developers respond to issues. We want to focus on work that benefits the whole community, e.g., fixing bugs and adding features. Support only helps individuals. GitHub also notifies thousands of people when issues are filed. We want them to see you communicating an interesting problem, rather than being redirected to Stack Overflow.


System information

You can collect some of this information using our environment capture script:

https://github.com/tensorflow/tensorflow/tree/master/tools/tf_env_collect.sh

You can obtain the TensorFlow version with

python -c "import tensorflow as tf; print(tf.GIT_VERSION, tf.VERSION)"

Describe the problem

finetuning ssd mobile v2 ,when restoring checkpoint ,there is a problem

Source code / logs

ncodingPredictor/biases/ExponentialMovingAverage not found in checkpoint INFO:tensorflow:Error reported to Coordinator: <class 'tensorflow.python.framework.errors_impl.NotFoundError'>, Key BoxPredictor_0/BoxEncodingPredictor/biases/ExponentialMovingAverage not found in checkpoint [[Node: save/RestoreV2 = RestoreV2[dtypes=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, ..., DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_INT64], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2/tensor_names, save/RestoreV2/shape_and_slices)]]

Caused by op 'save/RestoreV2', defined at: File "/home/vsoon/liangpeijun/models-master/research/object_detection/train.py", line 184, in tf.app.run() File "/opt/anaconda3/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 126, in run _sys.exit(main(argv)) File "/home/vsoon/liangpeijun/models-master/research/object_detection/train.py", line 180, in main graph_hook_fn=graph_rewriter_fn) File "/opt/anaconda3/lib/python3.6/site-packages/object_detection-0.1-py3.6.egg/object_detection/trainer.py", line 361, in train keep_checkpoint_every_n_hours=keep_checkpoint_every_n_hours) File "/opt/anaconda3/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1311, in init self.build() File "/opt/anaconda3/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1320, in build self._build(self._filename, build_save=True, build_restore=True) File "/opt/anaconda3/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1357, in _build build_save=build_save, build_restore=build_restore) File "/opt/anaconda3/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 809, in _build_internal restore_sequentially, reshape) File "/opt/anaconda3/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 448, in _AddRestoreOps restore_sequentially) File "/opt/anaconda3/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 860, in bulk_restore return io_ops.restore_v2(filename_tensor, names, slices, dtypes) File "/opt/anaconda3/lib/python3.6/site-packages/tensorflow/python/ops/gen_io_ops.py", line 1458, in restore_v2 shape_and_slices=shape_and_slices, dtypes=dtypes, name=name) File "/opt/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper op_def=op_def) File "/opt/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3290, in create_op op_def=op_def) File "/opt/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1654, in init self._traceback = self._graph._extract_stack() # pylint: disable=protected-access

NotFoundError (see above for traceback): Key BoxPredictor_0/BoxEncodingPredictor/biases/ExponentialMovingAverage not found in checkpoint [[Node: save/RestoreV2 = RestoreV2[dtypes=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, ..., DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_INT64], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2/tensor_names, save/RestoreV2/shape_and_slices)]]

INFO:tensorflow:Error reported to Coordinator: <class 'tensorflow.python.framework.errors_impl.NotFoundError'>, Key BoxPredictor_0/BoxEncodingPredictor/biases/ExponentialMovingAverage not found in checkpoint [[Node: save/RestoreV2 = RestoreV2[dtypes=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, ..., DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_INT64], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2/tensor_names, save/RestoreV2/shape_and_slices)]]

Caused by op 'save/RestoreV2', defined at: File "/home/vsoon/liangpeijun/models-master/research/object_detection/train.py", line 184, in tf.app.run() File "/opt/anaconda3/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 126, in run _sys.exit(main(argv)) File "/home/vsoon/liangpeijun/models-master/research/object_detection/train.py", line 180, in main graph_hook_fn=graph_rewriter_fn) File "/opt/anaconda3/lib/python3.6/site-packages/object_detection-0.1-py3.6.egg/object_detection/trainer.py", line 361, in train keep_checkpoint_every_n_hours=keep_checkpoint_every_n_hours) File "/opt/anaconda3/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1311, in init self.build() File "/opt/anaconda3/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1320, in build self._build(self._filename, build_save=True, build_restore=True) File "/opt/anaconda3/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1357, in _build build_save=build_save, build_restore=build_restore) File "/opt/anaconda3/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 809, in _build_internal restore_sequentially, reshape) File "/opt/anaconda3/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 448, in _AddRestoreOps restore_sequentially) File "/opt/anaconda3/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 860, in bulk_restore return io_ops.restore_v2(filename_tensor, names, slices, dtypes) File "/opt/anaconda3/lib/python3.6/site-packages/tensorflow/python/ops/gen_io_ops.py", line 1458, in restore_v2 shape_and_slices=shape_and_slices, dtypes=dtypes, name=name) File "/opt/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper op_def=op_def) File "/opt/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3290, in create_op op_def=op_def) File "/opt/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1654, in init self._traceback = self._graph._extract_stack() # pylint: disable=protected-access

NotFoundError (see above for traceback): Key BoxPredictor_0/BoxEncodingPredictor/biases/ExponentialMovingAverage not found in checkpoint [[Node: save/RestoreV2 = RestoreV2[dtypes=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, ..., DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_INT64], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2/tensor_names, save/RestoreV2/shape_and_slices)]]

Traceback (most recent call last): File "/opt/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1327, in _do_call return fn(*args) File "/opt/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1312, in _run_fn options, feed_dict, fetch_list, target_list, run_metadata) File "/opt/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1420, in _call_tf_sessionrun status, run_metadata) File "/opt/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/errors_impl.py", line 516, in exit c_api.TF_GetCode(self.status.status)) tensorflow.python.framework.errors_impl.NotFoundError: Key BoxPredictor_0/BoxEncodingPredictor/biases/ExponentialMovingAverage not found in checkpoint [[Node: save/RestoreV2 = RestoreV2[dtypes=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, ..., DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_INT64], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2/tensor_names, save/RestoreV2/shape_and_slices)]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/home/vsoon/liangpeijun/models-master/research/object_detection/train.py", line 184, in tf.app.run() File "/opt/anaconda3/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 126, in run _sys.exit(main(argv)) File "/home/vsoon/liangpeijun/models-master/research/object_detection/train.py", line 180, in main graph_hook_fn=graph_rewriter_fn) File "/opt/anaconda3/lib/python3.6/site-packages/object_detection-0.1-py3.6.egg/object_detection/trainer.py", line 399, in train saver=saver) File "/opt/anaconda3/lib/python3.6/site-packages/tensorflow/contrib/slim/python/slim/learning.py", line 747, in train master, start_standard_services=False, config=session_config) as sess: File "/opt/anaconda3/lib/python3.6/contextlib.py", line 81, in enter return next(self.gen) File "/opt/anaconda3/lib/python3.6/site-packages/tensorflow/python/training/supervisor.py", line 1000, in managed_session self.stop(close_summary_writer=close_summary_writer) File "/opt/anaconda3/lib/python3.6/site-packages/tensorflow/python/training/supervisor.py", line 828, in stop ignore_live_threads=ignore_live_threads) File "/opt/anaconda3/lib/python3.6/site-packages/tensorflow/python/training/coordinator.py", line 389, in join six.reraise(*self._exc_info_to_raise) File "/opt/anaconda3/lib/python3.6/site-packages/six.py", line 693, in reraise raise value File "/opt/anaconda3/lib/python3.6/site-packages/tensorflow/python/training/supervisor.py", line 989, in managed_session start_standard_services=start_standard_services) File "/opt/anaconda3/lib/python3.6/site-packages/tensorflow/python/training/supervisor.py", line 726, in prepare_or_wait_for_session init_feed_dict=self._init_feed_dict, init_fn=self._init_fn) File "/opt/anaconda3/lib/python3.6/site-packages/tensorflow/python/training/session_manager.py", line 275, in prepare_session config=config) File "/opt/anaconda3/lib/python3.6/site-packages/tensorflow/python/training/session_manager.py", line 207, in _restore_checkpoint saver.restore(sess, ckpt.model_checkpoint_path) File "/opt/anaconda3/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1775, in restore {self.saver_def.filename_tensor_name: save_path}) File "/opt/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 905, in run run_metadata_ptr) File "/opt/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1140, in _run feed_dict_tensor, options, run_metadata) File "/opt/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1321, in _do_run run_metadata) File "/opt/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1340, in _do_call raise type(e)(node_def, op, message) tensorflow.python.framework.errors_impl.NotFoundError: Key BoxPredictor_0/BoxEncodingPredictor/biases/ExponentialMovingAverage not found in checkpoint [[Node: save/RestoreV2 = RestoreV2[dtypes=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, ..., DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_INT64], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2/tensor_names, save/RestoreV2/shape_and_slices)]]

Caused by op 'save/RestoreV2', defined at: File "/home/vsoon/liangpeijun/models-master/research/object_detection/train.py", line 184, in tf.app.run() File "/opt/anaconda3/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 126, in run _sys.exit(main(argv)) File "/home/vsoon/liangpeijun/models-master/research/object_detection/train.py", line 180, in main graph_hook_fn=graph_rewriter_fn) File "/opt/anaconda3/lib/python3.6/site-packages/object_detection-0.1-py3.6.egg/object_detection/trainer.py", line 361, in train keep_checkpoint_every_n_hours=keep_checkpoint_every_n_hours) File "/opt/anaconda3/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1311, in init self.build() File "/opt/anaconda3/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1320, in build self._build(self._filename, build_save=True, build_restore=True) File "/opt/anaconda3/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1357, in _build build_save=build_save, build_restore=build_restore) File "/opt/anaconda3/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 809, in _build_internal restore_sequentially, reshape) File "/opt/anaconda3/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 448, in _AddRestoreOps restore_sequentially) File "/opt/anaconda3/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 860, in bulk_restore return io_ops.restore_v2(filename_tensor, names, slices, dtypes) File "/opt/anaconda3/lib/python3.6/site-packages/tensorflow/python/ops/gen_io_ops.py", line 1458, in restore_v2 shape_and_slices=shape_and_slices, dtypes=dtypes, name=name) File "/opt/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper op_def=op_def) File "/opt/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3290, in create_op op_def=op_def) File "/opt/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1654, in init self._traceback = self._graph._extract_stack() # pylint: disable=protected-access

NotFoundError (see above for traceback): Key BoxPredictor_0/BoxEncodingPredictor/biases/ExponentialMovingAverage not found in checkpoint [[Node: save/RestoreV2 = RestoreV2[dtypes=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, ..., DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_INT64], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2/tensor_names, save/RestoreV2/shape_and_slices)]]

pkulzc commented 6 years ago

Could you please share your config?

zishanahmed08 commented 6 years ago

@pkulzc I face the same issue with ssd_mobilenet_v1coco below is my config file

ssd_mobilenet_v1_coco_local.txt

pkulzc commented 6 years ago

Oh you can add "use_moving_averages: false" to the eval_config section in your config. This error happens when the pre-trained checkpoint was trained without moving average enabled.

zishanahmed08 commented 6 years ago

Thanks for the speedy reply.Will try it out

pageedward commented 6 years ago

try to modify the path to save ckpt differ from the ckpt path u load

zubairahmed-ai commented 5 years ago

NotFoundError (see above for traceback): Key BoxPredictor_0/BoxEncodingPredictor/biases/ExponentialMovingAverage not found in checkpoint [[Node: save/RestoreV2 = RestoreV2[dtypes=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, ..., DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_INT64], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2/tensor_names, save/RestoreV2/shape_and_slices)]]

I am facing this problem in SSD+MobileNet v2 so I'm not sure how to fix this?

zubairahmed-ai commented 5 years ago

I am facing the same issue with SSDLite+MobileNetv2 and SSD+MobileNet v2 and I'm not sure what to do to fix this?

pkulzc commented 5 years ago

Doesn't my earlier comment solve your issue?

zubairahmed-ai commented 5 years ago

I put that flag in the config and it didn't recognize it

pkulzc commented 5 years ago

Are you sure you are setting it correctly? It's used in other configs also.

zubairahmed-ai commented 5 years ago

Yes I am pretty sure, also the config you showed is for mobilenet v1 and I'm using v2

pkulzc commented 5 years ago

It's defined here so every model can use it. I'm simply showing that this field should be recognized.

Please share your config.

zubairahmed-ai commented 5 years ago

I will try this again now then share my config

zubairahmed-ai commented 5 years ago

It didnt work, here's my config here

Also I posted an issue here https://github.com/tensorflow/models/issues/5792

CCGY commented 5 years ago

@zubairahmed-ai have you figured out how to solve your problem?

zubairahmed-ai commented 5 years ago

@zubairahmed-ai have you figured out how to solve your problem?

Yes but honestly cant remember what it was and how did I fix it :)

tensorflowbutler commented 4 years ago

Hi There, We are checking to see if you still need help on this, as this seems to be considerably old issue. Please update this issue with the latest information, code snippet to reproduce your issue and error you are seeing. If we don't hear from you in the next 7 days, this issue will be closed automatically. If you don't need help on this issue any more, please consider closing this.