matterport / Mask_RCNN

Mask R-CNN for object detection and instance segmentation on Keras and TensorFlow
Other
24.52k stars 11.68k forks source link

Suffer from overfitting #281

Open keven4ever opened 6 years ago

keven4ever commented 6 years ago

Hello,

I only have a small training set with about 670 labelled images and would like to further improve the accuracy by training entire backbone network instead of only heads. However, after about 30,40 epoch, the network suffer from overfitting already. ResNet already uses batch norm, so i wonder if there is sth else i can do to improve the situation? How about dropout? If i apply dropout, can i still load the pre-trainned resent weight from CoCo or Imagenet? Or some other technique? Thank you!

paulcx commented 6 years ago

I'm thinking that one of the major difference is the choice of weighted focal loss used by the torch version of mask-rcnn.

John1231983 commented 6 years ago

@paulcx : Thanks for your information. Could you tell me which losses have been replaced by the weighted focal loss? I want to modify the repo to check the efficient.

# Losses
rpn_class_loss = KL.Lambda(lambda x: rpn_class_loss_graph(*x), name="rpn_class_loss")(
[input_rpn_match, rpn_class_logits])
rpn_bbox_loss = KL.Lambda(lambda x: rpn_bbox_loss_graph(config, *x), name="rpn_bbox_loss")(
[input_rpn_bbox, input_rpn_match, rpn_bbox])
class_loss = KL.Lambda(lambda x: mrcnn_class_loss_graph(*x), name="mrcnn_class_loss")(
[target_class_ids, mrcnn_class_logits, active_class_ids])
bbox_loss = KL.Lambda(lambda x: mrcnn_bbox_loss_graph(*x), name="mrcnn_bbox_loss")(
[target_bbox, target_class_ids, mrcnn_bbox])
mask_loss = KL.Lambda(lambda x: mrcnn_mask_loss_graph(*x), name="mrcnn_mask_loss")(
[target_mask, target_class_ids, mrcnn_mask])
paulcx commented 6 years ago

It's the rpn class loss and pytorch version replace the smooth_l1 with the weighted_smooth_l1 as well.

keven4ever commented 6 years ago

@John1231983 very interesting, could you pls elaberate a little bit more how you did dilation? Only apply binary dilation on each predicated mask instance? What kind of kernel did you use? As i understood, binary dilation operation just enlarge the area of foreground(in this case, the mask), right? why does this improve the performance? Also how to handle in case two masks are overlapping, which one should be dilated? Thx

John1231983 commented 6 years ago

This is my code. You can try it and let me know how much improve LB using it.

from skimage.morphology import binary_dilation
def refineMasks(mask):
    return binary_dilation(mask, disk(1))
#Run the refine masks
for i in range(predicts.shape[2]-1):
     predicts[:,:,i] = refineMasks(predicts[:,:,i])
keven4ever commented 6 years ago

@John1231983 thx for sharing the code! Btw, i can confirm that there is sth we missed for data augmentation. In my config, the result using flip l/r rotating is for sure better than using other augmentation. I checked the code again, still could not find out why, like image shape in image_meta, etc

keven4ever commented 6 years ago

@John1231983 hmm, i applied the dilation on top of my best model (lb score 0.448), the result is 0.412. So it seems the same optimisation doesn't apply to everyone, at least in this specific case. Thank you anyway!

John1231983 commented 6 years ago

I think so. It is very difficult to reproduce the result. For now, I think it is better to use focal lost as pytorch version. The author of pytorch shows the baseline maskrcnn is 0.5 which is so far from our baseline

keven4ever commented 6 years ago

Sure, keep us updated in case you get a boost. I will continue to figure out why other data augmentation doesn't help.

John1231983 commented 6 years ago

One more thing, Do you try other augmentation likes flipud and rot90? Because you said that only fliplr provided best performance

keven4ever commented 6 years ago

@John1231983 yep, i followed @maksimovkonstantin 's code, also introduced brightness augmentation

    factor = 1.0 + abs(random.gauss(mu=0.0, sigma=brightness))
    if random.randint(0, 1):
        factor = 1.0 / factor
    table = np.array([((i / 255.0) ** factor) * 255 for i in np.arange(0, 256)]).astype(np.uint8)
    output_images[0] = cv2.LUT(output_images[0], table)
John1231983 commented 6 years ago

How you use the function? I have tried but it cannot call for masks input. My masks input is WxHxnum_mask

keven4ever commented 6 years ago

ok, here is my code:

def data_augmentation(input_image, masks,
                      h_flip=True,
                      v_flip=True,
                      rotation=360,
                      zoom=1.5,
                      brightness=0.5,
                      crop=False):
    # first is input all other are output
    # Data augmentation
    output_image = input_image.copy()
    output_masks = masks.copy()
    # random crop
    # if crop and random.randint(0, 1):
    # h, w, c = output_images[0].shape
    # upper_h, new_h, upper_w, new_w = locs_for_random_crop(h, w)
    # output_images = [input_image[upper_h:upper_h + new_h, upper_w:upper_w + new_w, :] for input_image in output_images]

    # random flip
    if h_flip and random.randint(0, 1):
        output_image = np.fliplr(output_image)
        output_masks = np.fliplr(output_masks)

    if v_flip and random.randint(0, 1):
        output_image = np.flipud(output_image)
        output_masks = np.flipud(output_masks)

    factor = 1.0 + abs(random.gauss(mu=0.0, sigma=brightness))
    if random.randint(0, 1):
        factor = 1.0 / factor
    table = np.array([((i / 255.0) ** factor) * 255 for i in np.arange(0, 256)]).astype(np.uint8)
    output_image = cv2.LUT(output_image, table)
    if rotation:
        rotate_times = random.randint(0, rotation/90)
    else:
        rotate_times = 0.0
    for r in range(0, rotate_times):
        output_image = np.rot90(output_image)
        output_masks = np.rot90(output_masks)

    #     if zoom:
    #         scale = random.randint(50, zoom * 100) / 100
    #     else:
    #         scale = 1.0
    #     # print(angle, scale)
    #     if rotation or zoom:
    #         for i, input_image in enumerate(output_images):
    #             M = cv2.getRotationMatrix2D((input_image.shape[1] // 2, input_image.shape[0] // 2), angle, scale)
    #             # M = cv2.getRotationMatrix2D((input_image.shape[1] // 2, input_image.shape[0] // 2), 45, 1)
    #             output_images[i] = cv2.warpAffine(input_image, M, (input_image.shape[1], input_image.shape[0]))
    #     # print('len of output %s' % len(output_images))
    return output_image, output_masks

you just called it with data_augmentation(original_image, original_masks)

John1231983 commented 6 years ago

Thanks. I will try and let you know in my case. I know why you can use the code because you have commented the scale case. I have not the success with scale case.

tonyzhao6 commented 6 years ago

@keven4ever, @John1231983 , @maksimovkonstantin

Just wanted to say that this thread has been very useful in terms of my own training. Lots of good things learned from reading what you guys have tried/done!

keven4ever commented 6 years ago

@FruVirus you are welcome. In case you are also on DSB2018 competition, could you pls share which score do you get?

tonyzhao6 commented 6 years ago

@keven4ever

I am not on the DSB competition and unfortunately, I can't share that many details on my current work =/

keven4ever commented 6 years ago

@FruVirus no pb! Also learnt a lot from your tips and this is a great community, good luck with your work!

John1231983 commented 6 years ago

@keven4ever: do you have any improvement about your LB? Now I move to pytorch that train more faster and have more pretrained model. I will let you know if it helps the score improvement. Now I got 0.42 using this pytorch version of Heng.

Hatuw commented 6 years ago

Hello @John1231983, you can also try to use Keras for data augmentation. Here is the docs: https://keras.io/preprocessing/image/

fastlater commented 6 years ago

x2 with @FruVirus . This thread has been very useful and worth to read. I just want to add a few things:

Did you check this repo: https://github.com/aleju/imgaug . Maybe you can try with more complex augmentations. However, remember to check that you still can see the target after processing. For example, smoothing with too high value will make your target disappear and augmentation will be not be helpful, it will be against you contaminating the data.

I haven't see too much (not sure if I skipped those comments) talk about image processing. You can see in https://www.kaggle.com/c/data-science-bowl-2018/discussion/48130#282959 that image processing also helps a lot to get better results. Just as a suggestions, maybe you can try some image processing methods (some pre-processing before feed the network and some image quality enhancement techniques before inference) Searching for the best training parameters to create a more robust model is very important. However, I believe an enhanced input image will leads to better inference results.

I am not on the DSB competition but I though I could share a few of my thoughts with you. Perhaps one of my lines could be useful for your work and lead us to a further conversation about how to improve the results using Mask RCNN in any kind of instance segmentation task.

keven4ever commented 6 years ago

@Hatuw I tried Keras's image generator, the challenge is that for masks, i can't use vectorized approach, instead have to for loop each mask one by one to do augmentation, this makes training quite slow. Have you found some better way?

therahulkumar commented 6 years ago

Hi @keven4ever use can use vectorized implementation as implemented in this kernel https://www.kaggle.com/hexietufts/easy-to-use-keras-imagedatagenerator

keven4ever commented 6 years ago

@John1231983 thank you for asking! I actually had some progress, now 0.46+. Some findings:

John1231983 commented 6 years ago

@keven4ever : Good job. It is close to my LB. I suggest you can increase your LB by using external dataset. Some of dataset provided similar task as the challenge. I am using the dataset https://www.kaggle.com/voglinio/external-h-e-data-with-mask-annotations and it increases 0.03LB. Combined with mosaics image, I hope it can achieve 0.48 LB as a baseline. Hope the tips help you. Now, my score is 0.473 using pytorch code because of speed training.

Hatuw commented 6 years ago

@keven4ever Sorry about that I am so busy these days. I try to use the image generator in load_image_gt function, but it will make the training slow down. I think that generate some image before training is better. I haven'd paid attention to this challenge for some days. If you have some proposal, welcome to contact me and discuss together. Thanks!

waleedka commented 6 years ago

Good discussion about image augmentation here. I just pushed an update to support imgaug augmentations out of the box, by passing an augmentation object to the train() function.

http://imgaug.readthedocs.io/en/latest/source/augmenters.html

John1231983 commented 6 years ago

Thanks waleedka for this pr. I think we have to add one more condition in load_image_gt to pass cropped images which have zero masks in the case number of image per gpu is 1, otherwise it will feed zero masks to network and it has nan loss. For doing it, I think we add while() with condition number of mask is bigger than 0, if not, we will try to crop another position. How do you think that?

John1231983 commented 6 years ago

@waleedka : In your train () function, it only support fliplr. How about add more option likes scale, rotation?

augmentation = imgaug.augmenters.Fliplr(0.5)

Does it likes?

augmentation = imgaug.augmenters.Sequential([   
    imgaug.augmenters.Fliplr(0.5), # horizontally flip 50% of the images
    imgaug.augmenters.Flipud(0.5),  # horizontally flip 50% of the images
    sometimes(iaa.CropAndPad(
            percent=(-0.05, 0.1),
            pad_mode=ia.ALL,
            pad_cval=(0, 255)
        )),
        sometimes(iaa.Affine(
            scale={"x": (0.8, 1.2), "y": (0.8, 1.2)}, # scale images to 80-120% of their size, individually per axis
            translate_percent={"x": (-0.2, 0.2), "y": (-0.2, 0.2)}, # translate by -20 to +20 percent (per axis)
            rotate=(-45, 45), # rotate by -45 to +45 degrees
            shear=(-16, 16), # shear by -16 to +16 degrees
            order=[0, 1], # use nearest neighbour or bilinear interpolation (fast)
            cval=(0, 255), # if mode is constant, use a cval between 0 and 255
            mode=ia.ALL # use any of scikit-image's warping modes (see 2nd image from the top for examples)
        )),
])
waleedka commented 6 years ago

@John1231983 The train() function supports all the augmentations that imgaug offers, so yes, just pass that big augmentation sequence to train() and it should work.

The code applies the same augmentations to both, images and masks, and it already knows that some augmentations apply to images only and not to masks (like changing color channels or adding Gaussian noise). But, with that said, even augmentations that are safe for masks sometimes have options that make them unsafe, so always test your augmentations on both images and masks before training.

And, thanks for the tip about images with no masks. I'll look into it.

zhengli97 commented 6 years ago

@John1231983 Hi, John. I have tested random_crop, my score drops from 0.440 to 0.424. Here is my code. Is there something wrong?height=512 width=512 if image.shape[0]>=height&image.shape[1]>=width: if random.randint(0,1): image, mask=randomCrop(image,mask,width,height) My learning schedule is 50epochs all(1e-4) 25epochs all(1e-5). Can you help me?

John1231983 commented 6 years ago

@waleedka : Thanks for your reply. I used new PR and I got the error

Epoch 1/60
 28/435 [>.............................] - ETA: 5:35 - loss: 4.6554 - rpn_class_loss: 0.2579 - rpn_bbox_loss: 1.9744 - mrcnn_class_loss: 0.0893 - mrcnn_bbox_loss: 1.7575 - mrcnn_mask_loss: 0.5764Traceback (most recent call last):
  File "train.py", line 72, in <module>
    augmentation=augmentation)
  File "/home/john/mask_rcnn/model.py", line 2300, in train
    use_multiprocessing=True,
  File "/home/john/anaconda3/lib/python3.6/site-packages/keras/legacy/interfaces.py", line 91, in wrapper
    return func(*args, **kwargs)
  File "/home/john/anaconda3/lib/python3.6/site-packages/keras/engine/training.py", line 2192, in fit_generator
    generator_output = next(output_generator)
  File "/home/john/anaconda3/lib/python3.6/site-packages/keras/utils/data_utils.py", line 785, in get
    raise StopIteration()
StopIteration

I have no error before use the new PR. How can I fix it? I detect that the bug is somewhere in the file utils.py that you have updated @cccmdls : Your code is correct. But I will crop if it is bigger than 512 other wise using resize function. Let me know your LB with this one. I am using mosaics

zhengli97 commented 6 years ago

@John1231983 there is a problem that if the image is bigger than 512 and random.randint(0,1)=0, then it will not crop this image, how do you do that?resize or crop again? Currently I am training a model that without random.randint(0,1). I want to see what happens in this situation.

waleedka commented 6 years ago

@John1231983 I couldn't reproduce the error you mentioned. I tested on the train_shapes notebook and used the big augmentation you listed above and it worked. You might want to track that issue in your code. If you confirm that it's indeed a bug, please provide more details.

zhengli97 commented 6 years ago

@John1231983 so sad only got 0.410. LR=1e-4 50all(LR)+25all(LR/10) using mosaics. coco pretrained model. Test on the stage1_test. I don't know how to split the result based on mosaics_test to stage_test csv file. Can you help me?

John1231983 commented 6 years ago

@waleedka: I think someone who have same error as me provided to you in other thread. I think it may be similar. @ccmdls: i did not test on mosaic testing set. I only train on mosaic training set and test on original image. First, i will random crop 512x512 if the size of image is bigger than 512 ( not using prob crop), otherwise resize image to 512x512. I trained with 60 epochs on heads and 40 epoch on alls with learning rate 0.0001 using Adam. I don't know why someone success to train with SGD( i used sgd but got 0.44). Using above suggestion, you may got 0.47 (no post)~0.49lb (with post processing)

zhengli97 commented 6 years ago

@John1231983 Hi,John.Thanks for your advices, But I only got 0.380,0.370,0.383,0.377 without any post processing.I can't reproduce your result. So sorry. Here is my config file. Can you give me some advices? LEARNING_RATE = 1e-4 USE_MINI_MASK = True MINI_MASK_SHAPE = (56, 56) STEPS_PER_EPOCH = 392 VALIDATION_STEPS = 44 IMAGE_MIN_DIM = 512 IMAGE_MAX_DIM = 512 RPN_ANCHOR_SCALES = (8, 16, 32, 64, 128) # anchor side in pixels, maybe add a 256? BACKBONE_STRIDES = [4, 8, 16, 32, 64] RPN_TRAIN_ANCHORS_PER_IMAGE = 320 #320 POST_NMS_ROIS_TRAINING = 2000 POST_NMS_ROIS_INFERENCE = 2000

Pooled ROIs

POOL_SIZE = 7
MASK_POOL_SIZE = 14
MASK_SHAPE = [28, 28]
TRAIN_ROIS_PER_IMAGE = 512
RPN_NMS_THRESHOLD = 0.7
MAX_GT_INSTANCES = 600 #512
DETECTION_MAX_INSTANCES = 600 #400
DETECTION_MIN_CONFIDENCE = 0.7 # may be smaller?
DETECTION_NMS_THRESHOLD = 0.3 # 0.3
MEAN_PIXEL = np.array([31.92144429,29.84380259,34.66032842]) #mosaics
WEIGHT_DECAY = 0.0001

and here is my code about random crop.

height=512 width=512 if image.shape[0]>height&image.shape[1]>width: image, mask=randomCrop(image,mask,width,height) else: image, window, scale, padding = utils.resize_image( image, min_dim=config.IMAGE_MIN_DIM, max_dim=config.IMAGE_MAX_DIM, padding=config.IMAGE_PADDING) mask = utils.resize_mask(mask, scale, padding)

def randomCrop(img, mask, width, height): x = random.randint(0, img.shape[1] - width) y = random.randint(0, img.shape[0] - height) img = img[y:y+height, x:x+width] mask = mask[y:y+height, x:x+width] return img, mask

fastlater commented 6 years ago

Now that the competition is over, who knows a public github repo with the best score using mask rcnn?

The best result reported in kaggle https://www.kaggle.com/c/data-science-bowl-2018/discussion/54089 for @waleedka was 0.476 (I know this was just a baseline). I wonder if someone got higher score using matterport's Mask RCNN.

I am reading the top solution review https://www.kaggle.com/c/data-science-bowl-2018/discussion/54741 and they used Unet. However, I am interested to know who got the best result using the matterport's script. According to the article, the preprocessing of the masks, the correct augmentations and the 2nd level model played a crucial part in the good accuracy of their solution.

It will be interested to reproduce their key steps and change the unet for mask rcnn and comparte the results under similar pre/ post processing since they mentioned that they didn't try the mask rcnn for the competition.

@keven4ever @John1231983 Do you think using the same mask processing of the winner solution, mask rcnn could do as good as the winner?

Update: In this link here was reported ZhengLi as the highest score using Mask RCNN and his solution is here

shikunyu8 commented 6 years ago

Why nobody tune WEIGHT_DECAY? I think it's a hyper parameter that will affect L2 strength.

YubinXie commented 5 years ago

I think so. It is very difficult to reproduce the result. For now, I think it is better to use focal lost as pytorch version. The author of pytorch shows the baseline maskrcnn is 0.5 which is so far from our baseline

@paulcx Hi, can you point me where is the pytorch version? Thanks!

lunasdejavu commented 5 years ago

Can anyone tell me how to add codes to print the training accuracy and validation accuracy in every epoch since we want to check the model is overfitting or not?

PhanDuc commented 5 years ago

@lunasdejavu , you can check the val_loss to know if your model overfitting or not.

Altimis commented 4 years ago

@keven4ever @John1231983 @maksimovkonstantin Hello guys. Thank you so much for this discussion, that was like the best discussion I've ever read in github. I'm new in this field and I'm actually working on a project using Matterport implementation of Mask RCNN. I understood almost every technique that you mentioned in this section, but I'm confused using the training technique that you used for this competition. For example, training 20 epochs on heads and 80 epochs on all layers means that we are going to use the weights (model artifacts) generated by the first training (on heads) and use them to train on all layers ? Thank you in advance.

rupa1118 commented 4 years ago

@keven4ever

With such a small dataset, it is unlikely that BN or dropout will help. Also, BN with dropout is probably not a good idea (see paper on BN) and I don't think you can apply dropout with the pre-trained ResNet weights since that model didn't train using dropout in the first place.

The model capacity of ResNet-101 might be too large for your dataset. While it's true that ResNet enables deeper networks to converge compared to their plain counterparts, there is still a limit on the number of layers that can be incorporated in a ResNet before convergence suffers. For example, Table 6 in the ResNet paper shows that the classification error on CIFAR-10 decreases with increasing number of layers in ResNet up until ResNet-1202. ResNet-1202 actually performs worse than ResNet-32.

To prevent overfitting, you can try:

  1. Getting a larger dataset (but this is probably not feasible, otherwise you would've done this already)
  2. Stronger weight decay (i.e., L2 regularization)
  3. Lower model capacity (e.g., ResNet-50 or even ResNet-32)
  4. k-fold cross-validation

Hello @FruVirus can you please say how to apply k-fold cross-validation for mask-rcnn matterport repo ...??? Thanks in Advance.

rupa1118 commented 4 years ago

Hello everyone, can someone provide the information on how to check whether the model is overfitting or not?

Altimis commented 4 years ago

Hello everyone, can someone provide the information on how to check whether the model is overfitting or not?

By looking at your val_loss if it stops decreasing. You can apply early stopping to stop the training when the model starts to overfit (depending on how much epochs the val_loss has stopped decreasing)

rupa1118 commented 4 years ago

Hello everyone, can someone provide the information on how to check whether the model is overfitting or not?

By looking at your val_loss if it stops decreasing. You can apply early stopping to stop the training when the model starts to overfit (depending on how much epochs the val_loss has stopped decreasing)

Hi @Altimis, but if the val_loss is fluctuating ??? why does the val_loss fluctuate?

cairomo commented 3 years ago

@rupa1118 this repo has an example of how to use K fold cross-validation. you can use the built-in sklearn KFold methods

mahdisbr commented 3 years ago

Just a simple question! every model that I saw for custom data used a json file for labelling but in your code for augmentation you use only mask images. Are you convert mask images to json files after augmentation? @maksimovkonstantin @keven4ever @John1231983