tensorflow / models

Models and examples built with TensorFlow
Other
76.99k stars 45.78k forks source link

Visualization of RPN looks wrong #10231

Open iumyx2612 opened 3 years ago

iumyx2612 commented 3 years ago

Prerequisites

Please answer the following questions for yourself before submitting an issue.

1. The entire URL of the file you are using

https://github.com/tensorflow/models/blob/master/research/object_detection/utils/visualization_utils.py

2. Describe the bug

The visualization of proposal regions for Faster R-CNN Resnet101 aren't showing correctly. Left is output of RPN, Right is output of the entire network with Fast R-CNN head image

3. Steps to reproduce

I build the model with number_of_stages: 1 in the config file

    configs = config_util.get_configs_from_pipeline_file(path_to_config)
    model_config = configs['model']
    detection_model = model_builder.build(model_config=model_config, is_training=False)

    # Restore checkpoint
    ckpt = tf.compat.v2.train.Checkpoint(model=detection_model)
    ckpt.restore(os.path.join(path_to_ckpt, 'ckpt-0')).expect_partial()

Then run detection on an image

    # load the label
    category_index = load_label(label_path)
    # load image into numpy array
    image_np = np.array(Image.open(image_path))
    # input neeeds to be a tensor
    input_tensor = tf.convert_to_tensor(image_np, dtype=tf.float32)
    # input expected to be in batch -> add new dim to input
    input_tensor = input_tensor[tf.newaxis, ...]

    @tf.function
    def detect_fn(image, detection_model):
        """Detect objects in image."""

        image, shapes = detection_model.preprocess(image)
        prediction_dict = detection_model.predict(image, shapes)
        detections = detection_model.postprocess(prediction_dict, shapes)

        return detections

    detections = detect_fn(input_tensor, detection_model)

Then pass the detections to visualization_utils.visualize_boxes_and_labels_on_image_array

    # visualize prediction
    viz_utils.visualize_boxes_and_labels_on_image_array(
        image=image_np_for_detections,
        boxes=detections['detection_boxes'],
        classes=None,
        scores=detections['detection_scores'],
        category_index=category_index,
        use_normalized_coordinates=True,
        max_boxes_to_draw=box_to_visualize,
        min_score_thresh=min_score,
        line_thickness=2,
        skip_labels=True,
        agnostic_mode=True,
        skip_scores=skip_score
    )

Finally show the image with PIL

img = Image.fromarray(image_np_for_detections, 'RGB')
img.show()

4. Expected behavior

Correct visualization, this is from Faster R-CNN Resnet 50

image

5. Additional context

Include any logs that would be helpful to diagnose the problem.

6. System information

sachinprasadhs commented 3 years ago

Faster RCNN works in multiple stages.

google-ml-butler[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you.

iumyx2612 commented 3 years ago

Faster RCNN works in multiple stages.

  • It takes the feature maps from CNN and passes them to the Region Proposal Network(RPN), and result of this is the n anchor boxes which you see in the left side of the image above. These Anchor boxes have different sizes and RPN predicts the probability that anchor is an object(without labels) and bounding box regressor for adjusting the anchors for better fit the object. That is the reason the RPN network output image won't be close to the final output since it has not passed the second stage.
  • After the RPN, it is passed to the pooling layer so that each proposals with no classes assigned will be cropped and classified to a object by extracting fixed size feature maps for each anchor. Finally, these feature maps are passed to a fully connected layer which has a softmax and linear regression layer to classify the object and predict the final bounding boxes for the identified objects. That is where you see the accurate bounding boxes in the right image.

I understand the concept but in Resnet-50 backbone below, the FPN did a pretty good job on the proposal regions. I'm scared that in the Resnet-101 backbone, the region passed to the pooling layer isn't good (which is weird because Resnet-101 is expected to do better than Resnet-50 right?) which may make the prediction head to do poorly.