tensorflow / models

Models and examples built with TensorFlow
Other
77.01k stars 45.78k forks source link

Exception has occurred: tensorflow.python.framework.errors_impl.NotFoundError #6093

Closed lunasdejavu closed 4 years ago

lunasdejavu commented 5 years ago

System information

https://github.com/tensorflow/tensorflow/tree/master/tools/tf_env_collect.sh

You can obtain the TensorFlow version with

python -c "import tensorflow as tf; print(tf.GIT_VERSION, tf.VERSION)"

I created the TFrecords from here 1 class and 1010 png images and Mask R-CNN with Resnet-50 (v1), Atrous version model from here config from here I modified the path and the tfrecord name in config and the image type, when I used the command above, the errors showed up:

Exception has occurred: tensorflow.python.framework.errors_impl.NotFoundError Restoring from checkpoint failed. This is most likely due to a Variable name or other graph key that is missing from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error: Key Conv/biases/Momentum not found in checkpoint [[node save/RestoreV2 (defined at C:\Users\willy_sung\AppData\Local\Continuum\anaconda3\envs\venv\lib\site-packages\object_detection-0.1-py3.6.egg\object_detection\legacy\trainer.py:377) = RestoreV2[dtypes=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, ..., DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_INT64], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2/tensor_names, save/RestoreV2/shape_and_slices)]] Caused by op 'save/RestoreV2', defined at: File "c:\Users\willy_sung.vscode\extensions\ms-python.python-2018.12.1\pythonFiles\ptvsd_launcher.py", line 45, in main(ptvsdArgs) File "c:\Users\willy_sung.vscode\extensions\ms-python.python-2018.12.1\pythonFiles\lib\python\ptvsd__main.py", line 265, in main wait=args.wait) File "c:\Users\willy_sung.vscode\extensions\ms-python.python-2018.12.1\pythonFiles\lib\python\ptvsd__main__.py", line 258, in handle_args debug_main(addr, name, kind, *extra, kwargs) File "c:\Users\willy_sung.vscode\extensions\ms-python.python-2018.12.1\pythonFiles\lib\python\ptvsd_local.py", line 45, in debug_main run_file(address, name, extra, kwargs) File "c:\Users\willy_sung.vscode\extensions\ms-python.python-2018.12.1\pythonFiles\lib\python\ptvsd_local.py", line 79, in run_file run(argv, addr, kwargs) File "c:\Users\willy_sung.vscode\extensions\ms-python.python-2018.12.1\pythonFiles\lib\python\ptvsd_local.py", line 140, in _run _pydevd.main() File "c:\Users\willy_sung.vscode\extensions\ms-python.python-2018.12.1\pythonFiles\lib\python\ptvsd_vendored\pydevd\pydevd.py", line 1925, in main debugger.connect(host, port) File "c:\Users\willy_sung.vscode\extensions\ms-python.python-2018.12.1\pythonFiles\lib\python\ptvsd_vendored\pydevd\pydevd.py", line 1283, in run return self._exec(is_module, entry_point_fn, module_name, file, globals, locals) File "c:\Users\willy_sung.vscode\extensions\ms-python.python-2018.12.1\pythonFiles\lib\python\ptvsd_vendored\pydevd\pydevd.py", line 1290, in _exec pydev_imports.execfile(file, globals, locals) # execute the script File "c:\Users\willy_sung.vscode\extensions\ms-python.python-2018.12.1\pythonFiles\lib\python\ptvsd_vendored\pydevd_pydev_imps_pydev_execfile.py", line 25, in execfile exec(compile(contents+"\n", file, 'exec'), glob, loc) File "c:\models\research\object_detection\legacy\train.py", line 184, in tf.app.run() File "C:\Users\willy_sung\AppData\Local\Continuum\anaconda3\envs\venv\lib\site-packages\tensorflow\python\platform\app.py", line 125, in run _sys.exit(main(argv)) File "C:\Users\willy_sung\AppData\Local\Continuum\anaconda3\envs\venv\lib\site-packages\tensorflow\python\util\deprecation.py", line 306, in new_func return func(args, kwargs) File "c:\models\research\object_detection\legacy\train.py", line 180, in main graph_hook_fn=graph_rewriter_fn) File "C:\Users\willy_sung\AppData\Local\Continuum\anaconda3\envs\venv\lib\site-packages\object_detection-0.1-py3.6.egg\object_detection\legacy\trainer.py", line 377, in train keep_checkpoint_every_n_hours=keep_checkpoint_every_n_hours) File "C:\Users\willy_sung\AppData\Local\Continuum\anaconda3\envs\venv\lib\site-packages\tensorflow\python\training\saver.py", line 1102, in init self.build() File "C:\Users\willy_sung\AppData\Local\Continuum\anaconda3\envs\venv\lib\site-packages\tensorflow\python\training\saver.py", line 1114, in build self._build(self._filename, build_save=True, build_restore=True) File "C:\Users\willy_sung\AppData\Local\Continuum\anaconda3\envs\venv\lib\site-packages\tensorflow\python\training\saver.py", line 1151, in _build build_save=build_save, build_restore=build_restore) File "C:\Users\willy_sung\AppData\Local\Continuum\anaconda3\envs\venv\lib\site-packages\tensorflow\python\training\saver.py", line 795, in _build_internal restore_sequentially, reshape) File "C:\Users\willy_sung\AppData\Local\Continuum\anaconda3\envs\venv\lib\site-packages\tensorflow\python\training\saver.py", line 406, in _AddRestoreOps restore_sequentially) File "C:\Users\willy_sung\AppData\Local\Continuum\anaconda3\envs\venv\lib\site-packages\tensorflow\python\training\saver.py", line 862, in bulk_restore return io_ops.restore_v2(filename_tensor, names, slices, dtypes) File "C:\Users\willy_sung\AppData\Local\Continuum\anaconda3\envs\venv\lib\site-packages\tensorflow\python\ops\gen_io_ops.py", line 1550, in restore_v2 shape_and_slices=shape_and_slices, dtypes=dtypes, name=name) File "C:\Users\willy_sung\AppData\Local\Continuum\anaconda3\envs\venv\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 787, in _apply_op_helper op_def=op_def) File "C:\Users\willy_sung\AppData\Local\Continuum\anaconda3\envs\venv\lib\site-packages\tensorflow\python\util\deprecation.py", line 488, in new_func return func(*args, **kwargs) File "C:\Users\willy_sung\AppData\Local\Continuum\anaconda3\envs\venv\lib\site-packages\tensorflow\python\framework\ops.py", line 3274, in create_op op_def=op_def) File "C:\Users\willy_sung\AppData\Local\Continuum\anaconda3\envs\venv\lib\site-packages\tensorflow\python\framework\ops.py", line 1770, in init__ self._traceback = tf_stack.extract_stack() NotFoundError (see above for traceback): Restoring from checkpoint failed. This is most likely due to a Variable name or other graph key that is missing from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error: Key Conv/biases/Momentum not found in checkpoint [[node save/RestoreV2 (defined at C:\Users\willy_sung\AppData\Local\Continuum\anaconda3\envs\venv\lib\site-packages\object_detection-0.1-py3.6.egg\object_detection\legacy\trainer.py:377) = RestoreV2[dtypes=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, ..., DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_INT64], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2/tensor_names, save/RestoreV2/shape_and_slices)]]

the path of the data and model is in the image

tfodapiissue

It seems the model is not right to the config file, but there is the same error when I try the maskrcnninceptionv2 model too. can anyone help me how to solve this problem?

lunasdejavu commented 5 years ago

all the local variables when the error happened in maskrcnninceptionv2 : cluster:None cluster_data:None configs:{'eval_config': num_examples: 8000 m...evals: 10 , 'eval_input_config': label_map_path: "C:...PNG_MASKS , 'eval_input_configs': [label_map_path: "C:...NG_MASKS ], 'model': faster_rcnn { numb...ht: 4.0 } , 'train_config': batch_size: 1 data_a...etection" , 'train_input_config': label_map_path: "C:...PNG_MASKS } create_input_dict_fn:functools.partial(<function main..get_next at 0x000000002FC548C8>, label_map_path: "C:\tf_od_api\mask_rcnn_inceptionnetv2\rainbow_label_map.pbtxt" load_instance_masks: true tf_record_input_reader { input_path: "C:\tf_od_api\mask_rcnn_inceptionnetv2\rainbow_train.record" } mask_type: PNG_MASKS ) env:{} get_next:<function main..get_next at 0x000000002FC548C8> graph_rewriter_fn:None input_config:label_map_path: "C:\tf_od_api\mask_rcnn_inceptionnetv2\rainbow_label_map.pbtxt" load_instance_masks: true tf_record_input_reader { input_path: "C:\tf_od_api\mask_rcnn_inceptionnetv2\rainbow_train.record" } is_chief:True master:'' model_config:faster_rcnn { number_of_stages: 3 num_classes: 1 image_resizer { keep_aspect_ratio_resizer { min_dimension: 1024 max_dimension: 1024 } } feature_extractor { type: "faster_rcnn_inception_v2" first_stage_features_stride: 16 } first_stage_anchor_generator { grid_anchor_generator { height_stride: 16 width_stride: 16 scales: 0.25 scales: 0.5 scales: 1.0 scales: 2.0 aspect_ratios: 0.5 aspect_ratios: 1.0 aspect_ratios: 2.0 } } first_stage_box_predictor_conv_hyperparams { op: CONV regularizer { l2_regularizer { weight: 0.0 } } initializer { truncated_normal_initializer { stddev: 0.009999999776482582 } } } first_stage_nms_score_threshold: 0.0 first_stage_nms_iou_threshold: 0.699999988079071 first_stage_max_proposals: 300 first_stage_localization_loss_weight: 2.0 first_stage_objectness_loss_weight: 1.0 initial_crop_size: 14 maxpool_kernel_size: 2 maxpool_stride: 2 second_stage_box_predictor { mask_rcnn_box_predictor { fc_hyperparams { op: FC regularizer { l2_regularizer { weight: 0.0 } } initializer { variance_scaling_initializer { factor: 1.0 uniform: true mode: FAN_AVG } } } use_dropout: false dropout_keep_probability: 1.0 conv_hyperparams { op: CONV regularizer { l2_regularizer { weight: 0.0 } } initializer { truncated_normal_initializer { stddev: 0.009999999776482582 } } } predict_instance_masks: true mask_prediction_conv_depth: 0 mask_height: 15 mask_width: 15 mask_prediction_num_conv_layers: 2 } } second_stage_post_processing { batch_non_max_suppression { score_threshold: 0.0 iou_threshold: 0.6000000238418579 max_detections_per_class: 100 max_total_detections: 300 } score_converter: SOFTMAX } second_stage_localization_loss_weight: 2.0 second_stage_classification_loss_weight: 1.0 second_stage_mask_prediction_loss_weight: 4.0 } model_fn:functools.partial(<function build at 0x000000002EDD8488>, model_config=faster_rcnn { number_of_stages: 3 num_classes: 1 image_resizer { keep_aspect_ratio_resizer { min_dimension: 1024 max_dimension: 1024 } } feature_extractor { type: "faster_rcnn_inception_v2" first_stage_features_stride: 16 } first_stage_anchor_generator { grid_anchor_generator { height_stride: 16 width_stride: 16 scales: 0.25 scales: 0.5 scales: 1.0 scales: 2.0 aspect_ratios: 0.5 aspect_ratios: 1.0 aspect_ratios: 2.0 } } first_stage_box_predictor_conv_hyperparams { op: CONV regularizer { l2_regularizer { weight: 0.0 } } initializer { truncated_normal_initializer { stddev: 0.009999999776482582 } } } first_stage_nms_score_threshold: 0.0 first_stage_nms_iou_threshold: 0.699999988079071 first_stage_max_proposals: 300 first_stage_localization_loss_weight: 2.0 first_stage_objectness_loss_weight: 1.0 initial_crop_size: 14 maxpool_kernel_size: 2 maxpool_stride: 2 second_stage_box_predictor { mask_rcnn_box_predictor { fc_hyperparams { op: FC regularizer { l2_regularizer { weight: 0.0 } } initializer { variance_scaling_initializer { factor: 1.0 uniform: true mode: FAN_AVG } } } use_dropout: false dropout_keep_probability: 1.0 conv_hyperparams { op: CONV regularizer { l2_regularizer { weight: 0.0 } } initializer { truncated_normal_initializer { stddev: 0.009999999776482582 } } } predict_instance_masks: true mask_prediction_conv_depth: 0 mask_height: 15 mask_width: 15 mask_prediction_num_conv_layers: 2 } } second_stage_post_processing { batch_non_max_suppression { score_threshold: 0.0 iou_threshold: 0.6000000238418579 max_detections_per_class: 100 max_total_detections: 300 } score_converter: SOFTMAX } second_stage_localization_loss_weight: 2.0 second_stage_classification_loss_weight: 1.0 second_stage_mask_prediction_loss_weight: 4.0 } , is_training=True) ps_tasks:0 task:0 task_data:{'index': 0, 'type': 'master'} task_info:<class 'main.TaskSpec'> train_config:batch_size: 1 data_augmentation_options { random_horizontal_flip { } } optimizer { momentum_optimizer { learning_rate { manual_step_learning_rate { initial_learning_rate: 0.00019999999494757503 schedule { step: 900000 learning_rate: 1.9999999494757503e-05 } schedule { step: 1200000 learning_rate: 1.9999999949504854e-06 } } } momentum_optimizer_value: 0.8999999761581421 } use_moving_average: false } gradient_clipping_by_norm: 10.0 fine_tune_checkpoint: "C:\tf_od_api\mask_rcnn_inceptionnetv2\model.ckpt" from_detection_checkpoint: true num_steps: 200000 fine_tune_checkpoint_type: "detection" worker_job_name:'lonely_worker' workerreplicas:1 :['c:\models\researc...\train.py'] exception: (<class 'tensorflow.p...undError'>, NotFoundError(), )

lunasdejavu commented 5 years ago

all the files for the training are in the link, can someone give me a hand?

ghost commented 5 years ago

Try deleting the checkpoint file that resides in the path your have specified at train_dir=... in your command

satyajithj commented 5 years ago

@emasoumi Legend!

tensorflowbutler commented 4 years ago

Hi There, We are checking to see if you still need help on this, as this seems to be an old issue. Please update this issue with the latest information, code snippet to reproduce your issue and error you are seeing. If we don't hear from you in the next 7 days, this issue will be closed automatically. If you don't need help on this issue any more, please consider closing this.