matterport / Mask_RCNN

Mask R-CNN for object detection and instance segmentation on Keras and TensorFlow
Other
24.68k stars 11.7k forks source link

Why does the DetectionLayer `mrcnn_detections` output detections that have BG/0 as their `class_id`? #475

Open CMCDragonkai opened 6 years ago

CMCDragonkai commented 6 years ago

I was looking at the coco sample inspect_model.py https://github.com/matterport/Mask_RCNN/blob/master/samples/coco/inspect_model.ipynb

Where they run the subgraph:

mrcnn = model.run_graph([image], [
    ("proposals", model.keras_model.get_layer("ROI").output),
    ("probs", model.keras_model.get_layer("mrcnn_class").output),
    ("deltas", model.keras_model.get_layer("mrcnn_bbox").output),
    ("masks", model.keras_model.get_layer("mrcnn_mask").output),
    ("detections", model.keras_model.get_layer("mrcnn_detection").output),
])

The example later filters out detections that have a class of 0 indicating background.

det_class_ids = mrcnn['detections'][0, :, 4].astype(np.int32)
det_count = np.where(det_class_ids == 0)[0][0]
det_class_ids = det_class_ids[:det_count]
detections = mrcnn['detections'][0, :det_count]

The shape sizes of the above outputs indicates that we get 1000 region proposals from RPN/FPN during inference. Later this gets filtered and refined down to 100 region proposals. But the end result is that there are only 8 inferred instances.

So why does the detections layer still output regions that have BG as their strongest classification? Inside the model.py, it appears that the output is connected to build_fpn_mask_graph. And I can't see where they are dealing with regions with 0 as their classification.

If the BG regions are being filtered out. Where does this occur in the pipeline?

CMCDragonkai commented 6 years ago

I found where the BG detentions are actually being filtered out. They are at the unmold_detections function that is called at the very end of model.detect.

https://github.com/matterport/Mask_RCNN/blob/master/mrcnn/model.py#L2390-L2396

This is a weird place to put this code. Notice that [:, 4] gets us the class_id of the detection box. Here it finds zero_ix which is an array of all the detection boxes that have a class_id of 0. It turns out that the detections array only contains BG detections at the end of the array. Thus N becomes the index at which BG detections begin. The boxes call truncates the detections array to only contain non-BG detections.

        # How many detections do we have?
        # Detections array is padded with zeros. Find the first class_id == 0.
        zero_ix = np.where(detections[:, 4] == 0)[0]
        N = zero_ix[0] if zero_ix.shape[0] > 0 else detections.shape[0]

        # Extract boxes, class_ids, scores, and class-specific masks
        boxes = detections[:N, :4]

I would have thought that the refine_detections_graph would the one that does it:

https://github.com/matterport/Mask_RCNN/blob/master/mrcnn/model.py#L711-L712

    # Filter out background boxes
    keep = tf.where(class_ids > 0)[:, 0]

Why is the final filtering of BG detections occurring in a utility function for unmolding the detections? BG detections are being passed to the mask layer too.

Perhaps it's some sort of limitation on keeping the shape consistent there. And you can only really remove the BG detections once you leave the neural network.

hoangphucITJP commented 4 years ago

@CMCDragonkai . It does remove the background detections in refine_detections_graph. The class_ids of 0 in unmold_detections are just paddings