iumyx2612 commented 3 years ago

Prerequisites

Please answer the following questions for yourself before submitting an issue.

[x] I am using the latest TensorFlow Model Garden release and TensorFlow 2.
[x] I am reporting the issue to the correct repository. (Model Garden official or research directory)
[x] I checked to make sure that this issue has not already been filed.

1. The entire URL of the file you are using

https://github.com/tensorflow/models/blob/master/research/object_detection/utils/visualization_utils.py

2. Describe the bug

The visualization of proposal regions for Faster R-CNN Resnet101 aren't showing correctly. Left is output of RPN, Right is output of the entire network with Fast R-CNN head

3. Steps to reproduce

I build the model with number_of_stages: 1 in the config file

    configs = config_util.get_configs_from_pipeline_file(path_to_config)
    model_config = configs['model']
    detection_model = model_builder.build(model_config=model_config, is_training=False)

    # Restore checkpoint
    ckpt = tf.compat.v2.train.Checkpoint(model=detection_model)
    ckpt.restore(os.path.join(path_to_ckpt, 'ckpt-0')).expect_partial()

Then run detection on an image

    # load the label
    category_index = load_label(label_path)
    # load image into numpy array
    image_np = np.array(Image.open(image_path))
    # input neeeds to be a tensor
    input_tensor = tf.convert_to_tensor(image_np, dtype=tf.float32)
    # input expected to be in batch -> add new dim to input
    input_tensor = input_tensor[tf.newaxis, ...]

    @tf.function
    def detect_fn(image, detection_model):
        """Detect objects in image."""

        image, shapes = detection_model.preprocess(image)
        prediction_dict = detection_model.predict(image, shapes)
        detections = detection_model.postprocess(prediction_dict, shapes)

        return detections

    detections = detect_fn(input_tensor, detection_model)

Then pass the detections to visualization_utils.visualize_boxes_and_labels_on_image_array

    # visualize prediction
    viz_utils.visualize_boxes_and_labels_on_image_array(
        image=image_np_for_detections,
        boxes=detections['detection_boxes'],
        classes=None,
        scores=detections['detection_scores'],
        category_index=category_index,
        use_normalized_coordinates=True,
        max_boxes_to_draw=box_to_visualize,
        min_score_thresh=min_score,
        line_thickness=2,
        skip_labels=True,
        agnostic_mode=True,
        skip_scores=skip_score
    )

Finally show the image with PIL

img = Image.fromarray(image_np_for_detections, 'RGB')
img.show()

4. Expected behavior

Correct visualization, this is from Faster R-CNN Resnet 50

5. Additional context

Include any logs that would be helpful to diagnose the problem.

6. System information

OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Windows 10
TensorFlow installed from (source or binary): source
TensorFlow version (use command below): 2.4
Python version: 3.8
Bazel version (if compiling from source):
GCC/Compiler version (if compiling from source):
CUDA/cuDNN version: 11.0/
GPU model and memory: GTX 1050 4GB

sachinprasadhs commented 3 years ago

Faster RCNN works in multiple stages.

It takes the feature maps from CNN and passes them to the Region Proposal Network(RPN), and result of this is the n anchor boxes which you see in the left side of the image above. These Anchor boxes have different sizes and RPN predicts the probability that anchor is an object(without labels) and bounding box regressor for adjusting the anchors for better fit the object. That is the reason the RPN network output image won't be close to the final output since it has not passed the second stage.
After the RPN, it is passed to the pooling layer so that each proposals with no classes assigned will be cropped and classified to a object by extracting fixed size feature maps for each anchor. Finally, these feature maps are passed to a fully connected layer which has a softmax and linear regression layer to classify the object and predict the final bounding boxes for the identified objects. That is where you see the accurate bounding boxes in the right image.

google-ml-butler[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you.