tensorflow / models

Models and examples built with TensorFlow
Other
76.97k stars 45.79k forks source link

ERROR: NotFoundError - File under path not being found #7504

Open tomruarol opened 5 years ago

tomruarol commented 5 years ago

I am trying to train a whole model based on the COCO dataset using this scripts provided but reducing the number of classes to only 6.

I run the download_and_preprocess_coco.sh script which downloads the dataset and calls the create_coco_tf_record.py script which creates the TFRecords from the dataset previously downloaded. After that steps (successfully achieved) I try to run the retrain_detection_model.sh as it is described in the tutorial, but modifying the labels .pdtxt file in order to take into account only 6 clases and modifying the pipeline.config file in order to achieve the same (with a v2 net and training the whole model option).

The first error that came out was:

RuntimeError: Did not find any input files matching the glob pattern [u'/tensorflow/models/research/tmp/mscoco/coco_train.record-00001-of-00010']

When I do have a file under: /tensorflow/models/research/tmp/mscoco/ which contains files of the following format:

coco_testdev.record-00000-of-00100
coco_train.record-00024-of-00100
coco_val.record-00001-of-00010

Being the first set of 5 numbers after the record part numbers that go from 00000 to 00099.

So I do have those files that the error reports I do not have, and I have the PATH specified in the pipeline.config file.

I managed to move on a bit by skipping the use of the glob library in the dataset_builder.py script under the route research/object_detection/builders/. It is not working as it should, so by just removing the use of it the script runs a bit ahead, but it still throws and error:

NotFoundError (see above for traceback): /tensorflow/models/research/tmp/mscoco/coco_train.record-00001-of-00010; No such file or directory
         [[node IteratorGetNext (defined at object_detection/model_main.py:105)  = IteratorGetNext[output_shapes=[[128], [128,300,300,3], [128,2], [128,3], [128,100], [128,100,4], [128,100,2], [128,100,2], [128,100], [128,100], [128,100], [128]],
 output_types=[DT_INT32, DT_FLOAT, DT_INT32, DT_INT32, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_INT32, DT_BOOL, DT_FLOAT, DT_INT32], _device="/job:localhost/replica:0/task:0/device:CPU:0"](IteratorV2)]]

I have not figured out how to move on from here.

I paste my pipeline.config file:

model {
  ssd {
    num_classes: 2
    image_resizer {
      fixed_shape_resizer {
        height: 300
        width: 300
      }
    }
    feature_extractor {
      type: "ssd_mobilenet_v2"
      depth_multiplier: 1.0
      min_depth: 16
      conv_hyperparams {
        regularizer {
          l2_regularizer {
            weight: 3.99999989895e-05
          }
        }
        initializer {
          random_normal_initializer {
            mean: 0.0
            stddev: 0.00999999977648
          }
        }
        activation: RELU_6
        batch_norm {
          decay: 0.97000002861
          center: true
          scale: true
          epsilon: 0.0010000000475
        }
      }
      override_base_feature_extractor_hyperparams: true
    }
    box_coder {
      faster_rcnn_box_coder {
        y_scale: 10.0
        x_scale: 10.0
        height_scale: 5.0
        width_scale: 5.0
      }
    }
    matcher {
      argmax_matcher {
        matched_threshold: 0.5
        unmatched_threshold: 0.5
        ignore_thresholds: false
        negatives_lower_than_unmatched: true
        force_match_for_each_row: true
        use_matmul_gather: true
      }
    }
    similarity_calculator {
      iou_similarity {
      }
    }
    box_predictor {
      convolutional_box_predictor {
        conv_hyperparams {
          regularizer {
            l2_regularizer {
              weight: 3.99999989895e-05
            }
          }
          initializer {
            random_normal_initializer {
              mean: 0.0
              stddev: 0.00999999977648
            }
          }
          activation: RELU_6
          batch_norm {
            decay: 0.97000002861
            center: true
            scale: true
            epsilon: 0.0010000000475
          }
        }
        min_depth: 0
        max_depth: 0
        num_layers_before_predictor: 0
        use_dropout: false
        dropout_keep_probability: 0.800000011921
        kernel_size: 1
        box_code_size: 4
        apply_sigmoid_to_scores: false
        class_prediction_bias_init: -4.59999990463
      }
    }
    anchor_generator {
      ssd_anchor_generator {
        num_layers: 6
        min_scale: 0.20000000298
        max_scale: 0.949999988079
        aspect_ratios: 1.0
        aspect_ratios: 2.0
        aspect_ratios: 0.5
        aspect_ratios: 3.0
        aspect_ratios: 0.333299994469
      }
    }
    post_processing {
      batch_non_max_suppression {
        score_threshold: 0.300000011921
        iou_threshold: 0.600000023842
        max_detections_per_class: 100
        max_total_detections: 100
      }
      score_converter: SIGMOID
    }
    normalize_loss_by_num_matches: true
    loss {
      localization_loss {
        weighted_smooth_l1 {
        }
      }
      classification_loss {
        weighted_sigmoid_focal {
          gamma: 2.0
          alpha: 0.75
        }
      }
      classification_weight: 1.0
      localization_weight: 1.0
    }
    encode_background_as_zeros: true
    normalize_loc_loss_by_codesize: true
    inplace_batchnorm_update: true
    freeze_batchnorm: false
  }
}
train_config {
  batch_size: 128
  data_augmentation_options {
    random_horizontal_flip {
    }
  }
  data_augmentation_options {
    ssd_random_crop {
    }
  }
  sync_replicas: true
  optimizer {
    momentum_optimizer {
      learning_rate {
        cosine_decay_learning_rate {
          learning_rate_base: 0.20000000298
          total_steps: 50000
          warmup_learning_rate: 0.0599999986589
          warmup_steps: 2000
        }
      }
      momentum_optimizer_value: 0.899999976158
    }
    use_moving_average: false
  }
  fine_tune_checkpoint: "/tensorflow/models/research/learn_human_car/ckpt/model.ckpt"
  from_detection_checkpoint: true
  load_all_detection_checkpoint_vars: true
  num_steps: 50000
  startup_delay_steps: 0.0
  replicas_to_aggregate: 8
  max_number_of_boxes: 100
  unpad_groundtruth_tensors: false
}
train_input_reader {
  label_map_path: "/tensorflow/models/research/object_detection/data/mscoco_label_map.pbtxt"
  tf_record_input_reader {
    input_path: "/tensorflow/models/research/tmp/mscoco/coco_train.record-00001-of-00010"
  }
}
eval_config {
  num_examples: 8000
  metrics_set: "coco_detection_metrics"
  use_moving_averages: false
}
eval_input_reader {
  label_map_path: "/tensorflow/models/research/object_detection/data/mscoco_label_map.pbtxt"
  shuffle: false
  num_readers: 1
  tf_record_input_reader {
    input_path: "/tensorflow/models/research/tmp/mscoco/coco_val.record-?????-of-00010"
  }
}
graph_rewriter {
  quantization {
    delay: 48000
    weight_bits: 8
    activation_bits: 8
  }
}
blairhan commented 5 years ago

Seems you put a wrong number for the path. “00010->00100”

sirajpathan commented 4 years ago

This type of error happens if you are using wrong path for input_path key. Check file path mentioned in config file for input_path is present on disk, you will not get exact file but you should see file with something this name: coco_val.record-00000-of-00010 OR coco_val.record-00099-of-00010

(coco_val.record-?????-of-00010 question mark represents 5 digit number of record file)

ravikyram commented 4 years ago

@tomruarol

Is this still an issue?.Please, close this thread if your issue was resolved.Thanks!

tomruarol commented 4 years ago

Yes, it's still an issue.