matterport / Mask_RCNN

Mask R-CNN for object detection and instance segmentation on Keras and TensorFlow
Other
24.57k stars 11.69k forks source link

What Mask RCNN actually learns? #720

Open Arkkienkeli opened 6 years ago

Arkkienkeli commented 6 years ago

Hi. I have a question: What Mask RCNN actually learns? Does it learn shapes? Does it learn transitions from background to object? Or both? Or something else? Thank you.

Xiaoyang-Rebecca commented 6 years ago

You can see the definition of CNN, it is trying to capture the "hyper-feature" of images but can not tell exactly what the features are. E.g the 1 st Layer of CNN usually trying to capture the edges.

For the background proposition, the mask branch is designed to learn the a binary mask for each ROI, it really depends on how positive pixel are labeled in the training procedure.

bahmed11 commented 6 years ago

Mask RCNN consists of different stages: Stage 1: This consists of resnet backbone which just consists of a punch of cnn layers to learn basic and advanced features from coco images. Stage 2: This is the Region Proposal Network (RPN) which is responsible for creating region proposals based on the "Selective Search" method. Every time the model start a new iteration, cnn features are getting updated by resnet and based on the updated features, new region proposals are created. Stage 3: This is the Fully Convolutional Network (FCN) which is responsible for creating masks for each region proposal. FCN is applied to each region proposal as if it is a separate small image by itself. It can be considered as fine tuning the features created from resnet.

Generally speaking, Mask RCNN is learning features starting from edges, shapes, up to advanced features like cars, people,.... etc.