pierrejeambrun commented 6 years ago

Inference not working after a training with convert_to_grayscale parameter

I Systeme Information

OS Platform and Distribution: Linux Ubuntu 16.04 LTS
TensorFlow installed from: pip tensorflow-gpu
TensorFlow version: '1.8.0'
CUDA/cuDNN version: Cuda 9.0, Cudnn 7.0
GPU model and memory: 1 Tesla K-80, Cores 4, Memory 40 G

II Describe the Problem

I have successfully trained a faster_rcnn_resnet_101 on my own dataset and run multiple inferences. Latelly I wanted to train a network on grayscale images, I used the available option in the image_resizer called convert_to_grayscale. I just switched that flag to true and started a new training. Everything went as usual and I could export the frozen graph aswell. However when I run the inference script on the same image I got an error caused by the FirstStageFeatureExtractor shape assert. (/research/object_detection/models/faster_rcnn_resnet_v1_feature_extractor.py line 109). If I delete this assert the inference run smoothly but I got a score of 0 for all the boxes. The model detects nothing even for training images where he could find several boxes during the evaluation job. Original images are about 1900x1000.

III Logs, Code

The specific error raised during the inference.

InvalidArgumentError (see above for traceback): assertion failed: [image size must at least be 33 in both height and width.]                                                            
[[Node: FirstStageFeatureExtractor/Assert/Assert = Assert[T=[DT_STRING], summarize=3, _device="/job:localhost/replica:0/task:0/device:CPU:0"](FirstStageFeatureExtractor/LogicalAnd/_101, FirstStageFeatureExtractor/Assert/Assert/data_0)]]                               
[[Node: GridAnchorGenerator/assert_equal/Assert/Assert/_132 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_1200_GridAnchorGenerator/assert_equal/Assert/Assert", tensor_type=DT_FLOAT, _device="/job:localhost/r
eplica:0/task:0/device:GPU:0"]()]]

My config file:

model {
  faster_rcnn {
    num_classes: 47
    image_resizer {
      keep_aspect_ratio_resizer {
        min_dimension: 600
        max_dimension: 1024
        convert_to_grayscale: true
      }
    }
    feature_extractor {
      type: 'faster_rcnn_resnet101'
      first_stage_features_stride: 16
    }
    first_stage_anchor_generator {
      grid_anchor_generator {
        scales: [0.25, 0.5, 1.0, 2.0]
        aspect_ratios: [0.5, 1.0, 2.0]
        height_stride: 16
        width_stride: 16
      }
    }
    first_stage_box_predictor_conv_hyperparams {
      op: CONV
      regularizer {
        l2_regularizer {
          weight: 0.0
        }
      }
      initializer {
        truncated_normal_initializer {
          stddev: 0.01
        }
      }
    }
    first_stage_nms_score_threshold: 0.0
    first_stage_nms_iou_threshold: 0.7
    first_stage_max_proposals: 300
    first_stage_localization_loss_weight: 2.0
    first_stage_objectness_loss_weight: 1.0
    initial_crop_size: 14
    maxpool_kernel_size: 2
    maxpool_stride: 2
    second_stage_box_predictor {
      mask_rcnn_box_predictor {
        use_dropout: false
        dropout_keep_probability: 1.0
        fc_hyperparams {
          op: FC
          regularizer {
            l2_regularizer {
              weight: 0.0
            }
          }
          initializer {
            variance_scaling_initializer {
              factor: 1.0
              uniform: true
              mode: FAN_AVG
            }
          }
        }
      }
    }
    second_stage_post_processing {
      batch_non_max_suppression {
        score_threshold: 0.0
        iou_threshold: 0.6
        max_detections_per_class: 100
        max_total_detections: 300
      }
      score_converter: SOFTMAX
    }
    second_stage_localization_loss_weight: 2.0
    second_stage_classification_loss_weight: 1.0
    second_stage_classification_loss {
      weighted_sigmoid_focal {
         gamma: 2.0
         alpha: 0.25
      }
    }
  }
}

train_config {
  batch_size: 1
  optimizer {
    adam_optimizer {
      learning_rate {
        constant_learning_rate {
          learning_rate: 0.0001
        }
      }
    }
    use_moving_average: true
  }
  gradient_clipping_by_norm: 10.0
  fine_tune_checkpoint: "/home/ubuntu/models-tf/placeholder_data/model.ckpt"
  from_detection_checkpoint: true
  data_augmentation_options {
    random_horizontal_flip {
    }
  }
  data_augmentation_options {
    random_pixel_value_scale {
    }
  }
  data_augmentation_options {
    random_image_scale {
    }
  }
  data_augmentation_options {
    random_adjust_brightness {
    }
  }
  data_augmentation_options {
    random_adjust_contrast {
    }
  }
  data_augmentation_options {
    random_adjust_saturation {
    }
  }
  data_augmentation_options {
    random_jitter_boxes {
    }
  }
  data_augmentation_options {
    random_crop_image {
    }
  }
  data_augmentation_options {
    random_pad_image {
    }
  }
  data_augmentation_options {
    random_crop_pad_image {
    }
  }
  batch_queue_capacity: 150
}

train_input_reader: {
  tf_record_input_reader {
    input_path: "/home/ubuntu/models-tf/placeholder_data/train.record"
  }
  label_map_path: "/home/ubuntu/models-tf/placeholder_data/labelmap_placeholder.pbtxt"
}

eval_config: {
  num_examples: 30
  eval_interval_secs: 60
  num_visualizations: 20
  max_num_boxes_to_visualize: 1000
}

eval_input_reader: {
  tf_record_input_reader {
    input_path: "/home/ubuntu/models-tf/placeholder_data/val.record"
  }
  label_map_path: "/home/ubuntu/models-tf/placeholder_data/labelmap_placeholder.pbtxt"
  shuffle: false
  num_readers: 1
}

top-level directory of the model used: N/A custom code: N/A Bazel version: N/A Command to reproduce: Train a network with this config file and use the jupyter notebook to run an inference from the frozen graph.

Any help would be appreciated, Thank you

tensorflowbutler commented 6 years ago

Thank you for your post. We noticed you have not filled out the following field in the issue template. Could you update them if they are relevant in your case, or leave them as N/A? Thanks. What is the top-level directory of the model you are using Have I written custom code Bazel version Exact command to reproduce

pierrejeambrun commented 6 years ago

I updated the issue with the required information.

a819721810 commented 6 years ago

I also met this problem,but use tensorflow-1.6.0 is ok ,tensorflow-1.7.0,1.8.0 are terrible.

pierrejeambrun commented 6 years ago

Indeed downgrading to tensorflow-gpu==1.6.0 solves the issue.

I guess someone might want to take a deeper look into this to fix the problem for more recent tf versions.

For now I will stick to 1.6 when it comes to grayscale models.

Thank you.

wt-huang commented 6 years ago

Closing as this is resolved

tensorflow / models