matterport / Mask_RCNN

Mask R-CNN for object detection and instance segmentation on Keras and TensorFlow
Other
24.55k stars 11.69k forks source link

Get only one mask when expecting more. Something wrong inside the model? #136

Open ypflll opened 6 years ago

ypflll commented 6 years ago

Hi, I have trained a model on this dataset: http://vis-www.cs.umass.edu/lfw/, to segment face and hair. When testing, sometimes I got only one mask when it should be two(one for hair and one for face).

After debugging, I find in model.py line2332, only the first 'detections' is non-zero, and the first 'mrcnn_mask' is non-zero. Seems that the network only gives one valid mask.

However, I plot the first four masks as below: image

The first mask has segmented hair and face successfully! The problem is why it only gives one mask?

waleedka commented 6 years ago

It's hard to answer without having more details about what you're doing. What I see from your image is that you're getting one instance, and that instance has two masks (face and hair). Maybe your image has only one person in it?

ypflll commented 6 years ago

Yes, only one person. But my purpose is to segment hair and face in the picture. GT is like this((http://vis-www.cs.umass.edu/lfw/part_labels/)): image

So two masks are expected for two classes (each class has one instance): one for hair and one for face. However, as described above, only one mask is given in some cases. The picture above is visualization of 'mrcnn_mask' array, returned by self.keras_model.predict (model.py, line2332). But strangely, seeing from the one output mask, hair and face have been segmented successfully.

ypflll commented 6 years ago

I have thought it's because low Non-maximum suppression threshold for detection.

I tried to set bigger nms threshold for detection(DETECTION_NMS_THRESHOLD = 0.99), to get about 60 masks. However, they all have the same class id(hair). The face id is missing.

waleedka commented 6 years ago

Referring to the first image you posted, in the top-left box, there is a mask showing two colors: green for the face and purple for the hair. But the output from Mask RCNN are binary masks, so they can't have colors. The fact that your has colors tells me that you're probably visualizing the data incorrectly, possibly displaying multiple masks as one image.

Try the inspect_model notebook. It has visualizations that show the output of the network at each stage. That should help track the problem.

ypflll commented 6 years ago

It's definitely not a problem of visualization. I test on lfw dataset , and mask r-cnn gives one mask on 40 images, but half of them should have two masks. I have thought it's a problem of nms threshold. I tried a lot of nms threshold: grid from 0.45 to 0.9, in training and detecting both. But it doesn't make this miss detection problem better. Especially, when I set RPN_NMS_THRESHOLD to 0.6(both on training and detecting), all test picture only gives one mask. This really confuses me. And what's your strategy of choosing RPN_NMS_THRESHOLD and DETECTION_NMS_THRESHOLD?

To go further, I see the results before refine_detections, in which there are 1000 class_ids. I found that all the class_ids are the same (without regard for BG). So, no matter how to change the parameter like DETECTION_MIN_CONFIDENCE, RPN_NMS_THRESHOLD or DETECTION_NMS_THRESHOLD, one miss detection problem still exists. Do you have any idea about this? Any clue will be welcomed.

waleedka commented 6 years ago

Can you help me understand why the image you posted (top left box) has two colors (purple and green)? I'm thinking it's a binary mask, so it should have one color tone. What's the shape of the array that produced that image?

ypflll commented 6 years ago

In my case, there are 3 classes: hair, face and BG, so the masks of the function keras_model.predict outputs have a shape like: 1x100x28x28x3, where 100 is the MAX_GT_INSTANCES, 3 is the class number.

I do make a mistake in visualization: I plot the first four 28x28x3 in the picture above, make the class number confuses with depth. Actually, each mask has a shape of 28x28, and 3 is the class number, not the depth. The right way to visualize is plotting 3 masks separately, like: image

However, Mask RCNN predicts K(class number) masks for each instance, but only choose one refer to the classification result. From this view of point, it actually gives one mask when I expect 2: my problem still exists.

benjamin-taheri commented 6 years ago

I also have the same problem. All the class_ids are the same, although I have three classes and can see that the Mask-RCNN returns all the Masks.