matterport / Mask_RCNN

Mask R-CNN for object detection and instance segmentation on Keras and TensorFlow
Other
24.67k stars 11.71k forks source link

Model is overfitting very early on large dataset #1313

Open amardeepjaiman opened 5 years ago

amardeepjaiman commented 5 years ago

Hi All,

I have around 39k training images of 500x500 size and in validation set is 10% of the entire dataset (~4800 images). My class to predict the mask for ground feature building in the aerial images. Each image has ~100 buildings of different size and shape. The configuration I am using is below -

BACKBONE resnet101 BACKBONE_STRIDES [4, 8, 16, 32, 64] BATCH_SIZE 4 BBOX_STD_DEV [0.1 0.1 0.2 0.2] DETECTION_MAX_INSTANCES 100 DETECTION_MIN_CONFIDENCE 0.5 DETECTION_NMS_THRESHOLD 0.3 GPU_COUNT 1 GRADIENT_CLIP_NORM 5.0 IMAGES_PER_GPU 1 IMAGE_MAX_DIM 512 IMAGE_META_SIZE 14 IMAGE_MIN_DIM 512 IMAGE_RESIZE_MODE square IMAGE_SHAPE [512 512 3] LEARNING_MOMENTUM 0.9 LEARNING_RATE 0.001 MASK_POOL_SIZE 14 MASK_SHAPE [28, 28] MAX_GT_INSTANCES 100 MEAN_PIXEL [123.7 116.8 103.9] MINI_MASK_SHAPE (56, 56) NUM_CLASSES 2 POOL_SIZE 7 POST_NMS_ROIS_INFERENCE 1000 POST_NMS_ROIS_TRAINING 2000 ROI_POSITIVE_RATIO 0.33 RPN_ANCHOR_RATIOS [0.5, 1, 2] RPN_ANCHOR_SCALES (8, 16, 32, 64, 128) RPN_ANCHOR_STRIDE 1 RPN_BBOX_STD_DEV [0.1 0.1 0.2 0.2] RPN_NMS_THRESHOLD 0.9 RPN_TRAIN_ANCHORS_PER_IMAGE 256 STEPS_PER_EPOCH 9800 TRAIN_BN False TRAIN_ROIS_PER_IMAGE 200 USE_MINI_MASK False USE_RPN_ROIS True VALIDATION_STEPS 1220 WEIGHT_DECAY 0.0001

I used following training scheme,

  1. heads = epoch=20, aug=true, LR=0.001
  2. 4+ = epochs=40,aug=true,LR=0.001/10
  3. ALL = epochs=80,aug=true,LR=0.001/100

But after 25 epochs, val loss started increasing while training loss was going down. Means model started getting overfitted. Accuracy on test set is - Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.247 Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.546 Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.192 Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.149 Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.355 Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.163 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.019 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.143 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.331 Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.206 Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.459 Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.377

Can someone highlight, what improvements can be done ? Training set is not too small which could cause the issue. Should I not train the heads and train all the layers instead ? or what hyperparams to tweak to avoid overfitting ?

moganesyan commented 5 years ago

@amardeepjaiman Did you manage to solve this? I'm running into the same issue

Osdel commented 4 years ago

I am facing a similar issue. I will try: 1-Increase the weight decay 2-Increase dropout 3-I have a lot of images so I guess data augmentation its not a big deal but I will try it as a last resource. I hope to give you a answer as soon as I can, if you solve it, please reply.

jasdal365 commented 4 years ago

Same issue here. I tired adding or removing data augmentation. In my dataset strangely the model will overfitt less if I remove the data augmentation. I also tried weight decay of 0.0001 and 0.01 -> 0.0001 works a bit better, but not much improvement. Please also take a look at your data distribution. Overfitting decreases if you have approx. same number of data per class. :-) Hope this helps, I will also wait for other feedback to solve the overfitting problem