tensorflow / models

Models and examples built with TensorFlow
Other
76.95k stars 45.79k forks source link

Load Trained Model From Checkpoint - NotFoundError #9743

Open Fawcett-cpu opened 3 years ago

Fawcett-cpu commented 3 years ago

Please make sure that this is a build/installation issue. As per our GitHub Policy, we only address code/doc bugs, performance issues, feature requests and build/installation issues on GitHub. tag:build_template

System information

OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Windows 10 64bit TensorFlow installed from (source or binary): https://github.com/tensorflow/models TensorFlow version: 2 Python version: 3.7.3 Installed using virtualenv? pip? conda?: conda and pip Describe the problem I have downloaded and installed tensorflow, and I'm attempting to train a custom model, but keep getting runtime errors or notfounderrors to do with tenorflow lib files. Provide the exact sequence of commands / steps that you executed before running into the problem `WORKSPACE_PATH = 'Tensorflow/workspace' SCRIPTS_PATH = 'Tensorflow/scripts' APIMODEL_PATH = 'Tensorflow/models' ANNOTATION_PATH = WORKSPACE_PATH+'/annotations' IMAGE_PATH = WORKSPACE_PATH+'/images' MODEL_PATH = WORKSPACE_PATH+'/models' PRETRAINED_MODEL_PATH = WORKSPACE_PATH+'/pre-trained-models' CONFIG_PATH = MODEL_PATH+'/my_ssd_mobnet/pipeline.config' CHECKPOINT_PATH = MODEL_PATH+'/my_ssd_mobnet/'

labels = [{'name':'title', 'id':1}, {'name':'xaxis', 'id':2}, {'name':'yaxis', 'id':3}, {'name':'bar', 'id':4}, {'name':'key', 'id':5}]

with open(ANNOTATION_PATH + '\label_map.pbtxt', 'w') as f: for label in labels: f.write('item { \n') f.write('\tname:'{}'\n'.format(label['name'])) f.write('\tid:{}\n'.format(label['id'])) f.write('}\n')

!python {SCRIPTS_PATH + '/generate_tfrecord.py'} -x {IMAGE_PATH + '/train'} -l {ANNOTATION_PATH + '/label_map.pbtxt'} -o {ANNOTATION_PATH + '/train.record'} !python {SCRIPTS_PATH + '/generate_tfrecord.py'} -x{IMAGE_PATH + '/test'} -l {ANNOTATION_PATH + '/label_map.pbtxt'} -o {ANNOTATION_PATH + '/test.record'}

!cd Tensorflow && git clone https://github.com/tensorflow/models

CUSTOM_MODEL_NAME = 'my_ssd_mobnet' !mkdir {'Tensorflow\workspace\models\'+CUSTOM_MODEL_NAME} !cp {PRETRAINED_MODEL_PATH+'/ssd_mobilenet_v2_fpnlite_320x320_coco17_tpu-8/pipeline.config'} {MODEL_PATH+'/'+CUSTOM_MODEL_NAME}

import tensorflow as tf from object_detection.utils import config_util from object_detection.protos import pipeline_pb2 from google.protobuf import text_format

CONFIG_PATH = MODEL_PATH+'/'+CUSTOM_MODEL_NAME+'/pipeline.config'

config = config_util.get_configs_from_pipeline_file(CONFIG_PATH) config {'model': ssd { num_classes: 90 image_resizer { fixed_shape_resizer { height: 320 width: 320 } } feature_extractor { type: "ssd_mobilenet_v2_fpn_keras" depth_multiplier: 1.0 min_depth: 16 conv_hyperparams { regularizer { l2_regularizer { weight: 3.9999998989515007e-05 } } initializer { random_normal_initializer { mean: 0.0 stddev: 0.009999999776482582 } } activation: RELU_6 batch_norm { decay: 0.996999979019165 scale: true epsilon: 0.0010000000474974513 } } use_depthwise: true override_base_feature_extractor_hyperparams: true fpn { min_level: 3 max_level: 7 additional_layer_depth: 128 } } box_coder { faster_rcnn_box_coder { y_scale: 10.0 x_scale: 10.0 height_scale: 5.0 width_scale: 5.0 } } matcher { argmax_matcher { matched_threshold: 0.5 unmatched_threshold: 0.5 ignore_thresholds: false negatives_lower_than_unmatched: true force_match_for_each_row: true use_matmul_gather: true } } similarity_calculator { iou_similarity { } } box_predictor { weight_shared_convolutional_box_predictor { conv_hyperparams { regularizer { l2_regularizer { weight: 3.9999998989515007e-05 } } initializer { random_normal_initializer { mean: 0.0 stddev: 0.009999999776482582 } } activation: RELU_6 batch_norm { decay: 0.996999979019165 scale: true epsilon: 0.0010000000474974513 } } depth: 128 num_layers_before_predictor: 4 kernel_size: 3 class_prediction_bias_init: -4.599999904632568 share_prediction_tower: true use_depthwise: true } } anchor_generator { multiscale_anchor_generator { min_level: 3 max_level: 7 anchor_scale: 4.0 aspect_ratios: 1.0 aspect_ratios: 2.0 aspect_ratios: 0.5 scales_per_octave: 2 } } post_processing { batch_non_max_suppression { score_threshold: 9.99999993922529e-09 iou_threshold: 0.6000000238418579 max_detections_per_class: 100 max_total_detections: 100 use_static_shapes: false } score_converter: SIGMOID } normalize_loss_by_num_matches: true loss { localization_loss { weighted_smooth_l1 { } } classification_loss { weighted_sigmoid_focal { gamma: 2.0 alpha: 0.25 } } classification_weight: 1.0 localization_weight: 1.0 } encode_background_as_zeros: true normalize_loc_loss_by_codesize: true inplace_batchnorm_update: true freeze_batchnorm: false }, 'train_config': batch_size: 128 data_augmentation_options { random_horizontal_flip { } } data_augmentation_options { random_crop_image { min_object_covered: 0.0 min_aspect_ratio: 0.75 max_aspect_ratio: 3.0 min_area: 0.75 max_area: 1.0 overlap_thresh: 0.0 } } sync_replicas: true optimizer { momentum_optimizer { learning_rate { cosine_decay_learning_rate { learning_rate_base: 0.07999999821186066 total_steps: 50000 warmup_learning_rate: 0.026666000485420227 warmup_steps: 1000 } } momentum_optimizer_value: 0.8999999761581421 } use_moving_average: false } fine_tune_checkpoint: "PATH_TO_BE_CONFIGURED" num_steps: 50000 startup_delay_steps: 0.0 replicas_to_aggregate: 8 max_number_of_boxes: 100 unpad_groundtruth_tensors: false fine_tune_checkpoint_type: "classification" fine_tune_checkpoint_version: V2, 'train_input_config': label_map_path: "PATH_TO_BE_CONFIGURED" tf_record_input_reader { input_path: "PATH_TO_BE_CONFIGURED" }, 'eval_config': metrics_set: "coco_detection_metrics" use_moving_averages: false, 'eval_input_configs': [label_map_path: "PATH_TO_BE_CONFIGURED" shuffle: false num_epochs: 1 tf_record_input_reader { input_path: "PATH_TO_BE_CONFIGURED" } ], 'eval_input_config': label_map_path: "PATH_TO_BE_CONFIGURED" shuffle: false num_epochs: 1 tf_record_input_reader { input_path: "PATH_TO_BE_CONFIGURED" }} pipeline_config = pipeline_p

pipeline_config = pipeline_pb2.TrainEvalPipelineConfig() with tf.io.gfile.GFile(CONFIG_PATH, "r") as f: proto_str = f.read() text_format.Merge(proto_str, pipeline_config) pipeline_config.model.ssd.num_classes = 2 pipeline_config.train_config.batch_size = 4 pipeline_config.train_config.fine_tune_checkpoint = PRETRAINED_MODEL_PATH+'/ssd_mobilenet_v2_fpnlite_320x320_coco17_tpu-8/checkpoint/ckpt-0' pipeline_config.train_config.fine_tune_checkpoint_type = "detection" pipeline_config.train_input_reader.label_map_path= ANNOTATION_PATH + '/label_map.pbtxt' pipeline_config.train_input_reader.tf_record_input_reader.input_path[:] = [ANNOTATION_PATH + '/train.record'] pipeline_config.eval_input_reader[0].label_map_path = ANNOTATION_PATH + '/label_map.pbtxt' pipeline_config.eval_input_reader[0].tf_record_input_reader.input_path[:] = [ANNOTATION_PATH + '/test.record']

config_text = text_format.MessageToString(pipeline_config) with tf.io.gfile.GFile(CONFIG_PATH, "wb") as f: f.write(config_text)

print("""python {}/research/object_detection/model_main_tf2.py --model_dir={}/{} --pipeline_config_path={}/{}/pipeline.config --num_train_steps=5000""".format(APIMODEL_PATH, MODEL_PATH,CUSTOM_MODEL_NAME,MODEL_PATH,CUSTOM_MODEL_NAME))

import os from object_detection.utils import label_map_util from object_detection.utils import visualization_utils as viz_utils from object_detection.builders import model_builder

Load pipeline config and build a detection model configs = config_util.get_configs_from_pipeline_file(CONFIG_PATH) detection_model = model_builder.build(model_config=configs['model'], is_training=False)

Restore checkpoint ckpt = tf.compat.v2.train.Checkpoint(model=detection_model) ckpt.restore(os.path.join(CHECKPOINT_PATH, 'ckpt-6')).expect_partial()

@tf.function def detect_fn(image): image, shapes = detection_model.preprocess(image) prediction_dict = detection_model.predict(image, shapes) detections = detection_model.postprocess(prediction_dict, shapes) return detections

---------------------This Is Where The error Happens ----------------------------------------`

Error Message `--------------------------------------------------------------------------- RuntimeError Traceback (most recent call last) C:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\training\py_checkpoint_reader.py in NewCheckpointReader(filepattern) 94 try: ---> 95 return CheckpointReader(compat.as_bytes(filepattern)) 96 # TODO(b/143319754): Remove the RuntimeError casting logic once we resolve the

RuntimeError: Unsuccessful TensorSliceReader constructor: Failed to find any matching files for Tensorflow/workspace/models/my_ssd_mobnet/ckpt-6

During handling of the above exception, another exception occurred:

NotFoundError Traceback (most recent call last) C:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\training\tracking\util.py in restore(self, save_path, options) 2259 try: -> 2260 status = self.read(save_path, options=options) 2261 except errors_impl.NotFoundError:

C:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\training\tracking\util.py in read(self, save_path, options) 2147 options = options or checkpoint_options.CheckpointOptions() -> 2148 return self._saver.restore(save_path=save_path, options=options) 2149

C:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\training\tracking\util.py in restore(self, save_path, options) 1291 return InitializationOnlyStatus(self._graph_view, ops.uid()) -> 1292 reader = py_checkpoint_reader.NewCheckpointReader(save_path) 1293 graph_building = not context.executing_eagerly()

C:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\training\py_checkpoint_reader.py in NewCheckpointReader(filepattern) 98 except RuntimeError as e: ---> 99 error_translator(e)

C:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\training\py_checkpoint_reader.py in error_translator(e) 34 'matching files for') in error_message: ---> 35 raise errors_impl.NotFoundError(None, None, error_message) 36 elif 'Sliced checkpoints are not supported' in error_message or (

NotFoundError: Unsuccessful TensorSliceReader constructor: Failed to find any matching files for Tensorflow/workspace/models/my_ssd_mobnet/ckpt-6

During handling of the above exception, another exception occurred:

NotFoundError Traceback (most recent call last) in 5 # Restore checkpoint 6 ckpt = tf.compat.v2.train.Checkpoint(model=detection_model) ----> 7 ckpt.restore(os.path.join(CHECKPOINT_PATH, 'ckpt-6')).expect_partial() 8 9 @tf.function

C:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\training\tracking\util.py in restore(self, save_path, options) 2263 None, None, 2264 "Could not find checkpoint or SavedModel at {}." -> 2265 .format(orig_save_path)) 2266 # Create the save counter now so it gets initialized with other variables 2267 # when graph building. Creating it earlier would lead to errors when using,

NotFoundError: Could not find checkpoint or SavedModel at Tensorflow/workspace/models/my_ssd_mobnet/ckpt-6.`

shahzaib3311 commented 3 years ago

any update on this bug?

bhaskarpnd commented 2 years ago

Your custom-trained model might not have 6 checkpoints. Check the folder where the model is saved. There would be checkpoint files. The highest number on the checkpoint file should be used instead of ckpt-6 in _ckpt.restore(os.path.join(CHECKPOINT_PATH, 'ckpt-6')).expectpartial()

FTC20325MaximumResistance commented 2 years ago

I found that moving the ckpt-0.index and ckpt-0.data-00000-of-00001 out of the folder checkpoint and into the my_ssd_mobnet to work.