Training for Detection only with Rectangular bounding box and without polygonal mask

ghost commented 6 years ago

Hello Everyone,

Can you explain how do I train this detectron for object detection task only. By object detection only, I mean I have a rectangular bounding box around objects I need to detect. I don't have a mask for object and I don't even want that in my inference.

All I want is to train it like object detection frameworks like Yolo and SSD, just with rectangular bounding box around object.

Any suggestions will be highly appreciated. Thank you, Regards, Dharma KC

timbrucks commented 6 years ago

I would also be interested in this capability

tonyzhao6 commented 6 years ago

@dharma-kc, @timbrucks

In this case, I would argue that there is no benefit in using Mask R-CNN in the first place. It would be more straightforward to start with a Faster R-CNN implementation. Then, after fine-tuning the Faster R-CNN network, incorporate ROI Align in place of ROI Pooling if you want slightly better perfomance (see Table 3 in Mask R-CNN paper).

If you want to use Mask R-CNN, then you have to turn off layers and losses relating to the masks. For example, build_fpn_mask_graph() and mrcnn_mask_loss_graph(). You also have to change the way the image ground-truths are loaded, especially the part where the ground-truth bounding boxes are calculated from the ground-truth masks (as opposed to just using the provided ground-truth bounding boxes).

timbrucks commented 6 years ago

Thanks for the feedback @FruVirus. I will give that a shot. I will say that Mask R-CNN does give excellent results out of the box!

tonyzhao6 commented 6 years ago

@timbrucks , indeed it does! What I also like about Mask R-CNN is that the implementation is very modular and it takes you through some of the foundational steps in object detection and instance segmentation.

ghost commented 6 years ago

@timbrucks Thank you for your suggestion. I would like to enable the ROI Align to faster RCNN and remove ROI Pooling layer then. If you have experience with that could you tell me the places where I should change for that capability? Otherwise I will try on my own. Thank you.

DSpringQ commented 6 years ago

@FruVirus I have turned the layers related to segmentation off, as well as some parts of the input, including reading from the ground truth of rectangles instead of calculating from masks. Then I trained the model , and I found that the rpn_bbox_loss become nan after several epochs' iteration. How to fix it then? Thanks!

daoud commented 6 years ago

I am working on Mask R-CNN, I am training images. I have 1 + 1 class.In my JSON I have 2 shape: 'rect ' and 'polygon'. My not working when I added 'rect' also on same data set. Let how to handle the different shapes. My program only works with polygon shapes

amankhandelia commented 6 years ago

@daoud Well, it might be too late, but anyway, here is what you can do. Assuming you are loading mask as written in the repo and dataset is annotated using VIA, you can add the following lines to the load_mask fn after for i, p in enumerate(info["polygons"]):

            if p['name'] == 'rect':
                p['all_points_y'], p['all_points_x'] = [p['y'], p['y'] + p['height'], p['y'], p['y'] + p['height']], [p['x'], p['x'] + p['width'], p['x'] + p['width'], p['x']]

cam4ani commented 5 years ago

@amankhandelia , you might have reversed two entries in the all_points_y: all_points_x = [p['x'], p['x'] + p['width'], p['x'] + p['width'], p['x']] all_points_y = [p['y'], p['y'], p['y'] + p['height'], p['y'] + p['height']]

instead of your statement: all_points_x = [p['x'], p['x'] + p['width'], p['x'] + p['width'], p['x']] all_points_y = [p['y'], p['y'] + p['height'], p['y'], p['y'] + p['height']]

undayo commented 4 years ago

@FruVirus @timbrucks Hello Can you explain how I train the mask rcnn for the object and mask detection task. let me explain: a rectangular delimitation frame around the objects that I have to detect and mask it. I don't need labels and confidence scores

Any suggestion would be highly appreciated. thank you,

jasstionzyf commented 4 years ago

@dharma-kc, @timbrucks

In this case, I would argue that there is no benefit in using Mask R-CNN in the first place. It would be more straightforward to start with a Faster R-CNN implementation. Then, after fine-tuning the Faster R-CNN network, incorporate ROI Align in place of ROI Pooling if you want slightly better perfomance (see Table 3 in Mask R-CNN paper).

If you want to use Mask R-CNN, then you have to turn off layers and losses relating to the masks. For example, build_fpn_mask_graph() and mrcnn_mask_loss_graph(). You also have to change the way the image ground-truths are loaded, especially the part where the ground-truth bounding boxes are calculated from the ground-truth masks (as opposed to just using the provided ground-truth bounding boxes).

i have a thought that maybe easy to apply: for only box detection, i will fake a mask from the box area, then train the network keeping code no change , but set mrcnn_mask_loss of LOSS_WEIGHTS in the config to 0

sohinimallick commented 3 years ago

@Fruvirus Is it possible to do the other way around i.e. use ground truth bounding boxes to do the instance segmentation i.e. create masks.

jj411086 commented 1 year ago

    for i, p in enumerate(info["polygons"]):
        # Get indexes of pixels inside the polygon and set them to 1            

        if p['name'] == 'polygon':
            rr, cc = skimage.draw.polygon(p['all_points_y'], p['all_points_x'])
        elif p['name'] == 'rect':
            p['all_points_y'], p['all_points_x'] = [p['y'], p['y'], p['y']+p['height'], p['y']+p['height']], [p['x'], p['x']+p['width'], p['x']+p['width'], p['x']]                
            rr, cc = skimage.draw.polygon(p['all_points_y'], p['all_points_x'])                
        elif p['name'] == 'circle':
            rr, cc = skimage.draw.circle(p['cx'], p['cy'], p['r'])         

        rr[rr > mask.shape[0]-1] = mask.shape[0]-1
        cc[cc > mask.shape[1]-1] = mask.shape[1]-1

        mask[rr, cc, i] = 1

    return mask, np.ones([mask.shape[-1]], dtype=np.int32)

matterport / Mask_RCNN

Training for Detection only with Rectangular bounding box and without polygonal mask #256