Open keven4ever opened 6 years ago
@keven4ever
With such a small dataset, it is unlikely that BN or dropout will help. Also, BN with dropout is probably not a good idea (see paper on BN) and I don't think you can apply dropout with the pre-trained ResNet weights since that model didn't train using dropout in the first place.
The model capacity of ResNet-101 might be too large for your dataset. While it's true that ResNet enables deeper networks to converge compared to their plain counterparts, there is still a limit on the number of layers that can be incorporated in a ResNet before convergence suffers. For example, Table 6 in the ResNet paper shows that the classification error on CIFAR-10 decreases with increasing number of layers in ResNet up until ResNet-1202. ResNet-1202 actually performs worse than ResNet-32.
To prevent overfitting, you can try:
1) Getting a larger dataset (but this is probably not feasible, otherwise you would've done this already) 2) Stronger weight decay (i.e., L2 regularization) 3) Lower model capacity (e.g., ResNet-50 or even ResNet-32) 3) k-fold cross-validation
@keven4ever do you use augmentations? because in ds bowl 2018 it is critical. I had the same problem, augmentations helped me a lot.
@maksimovkonstantin : Thanks. What kind of augmentation techniques do you use? How much gain do you achieved? I checked that your score is 0.437. What is score without augmentation?
@maksimovkonstantin very good question! Actually i tried augmentation (without train full backbone) which only helps to improve loss but not val_loss. Also i tried to train full backbone with default augmentation (flip l/r) which suffer from overfitting. Next step i will try to combine both, btw, what kind of augmentation did you apply? flip l/r, flip u/d, rotate 90?
@FruVirus thx for tips, i also intend to try a shallow model like ResNet-50. i saw in model.py's resnet_graph
method, it supports both resnet50
and resnet101
, just change architecture
to resnet50
should be sufficient, right?
@keven4ever , yes I believe so. I'd be interested to hear if this helps with your dataset.
@FruVirus sure, will keep you updated! Btw, is there easy way to load coco pre-trained weight for ResNet50 FPN?
@John1231983 score without aug is 0.413
@maksimovkonstantin i tried to do some augmentation before image resizing, including flip l/r, flip u/d and rotate 90 degree, with ResNet101, as you can see, again it starts to overfit. What kind of aug did you apply? Are you using ResNet101 or 50?
@keven4ever I use default ResNet101, also I use rotate on custom angle here is my aug function
def data_augmentation(input_images,
h_flip=True,
v_flip=True,
rotation=360,
zoom=1.5,
brightness=0.5,
crop=False):
# first is input all other are output
# Data augmentation
output_images = input_images.copy()
if crop and random.randint(0, 1):
# random crop
h, w, c = output_images[0].shape
upper_h, new_h, upper_w, new_w = locs_for_random_crop(h, w)
output_images = [input_image[upper_h:upper_h + new_h, upper_w:upper_w + new_w, :] for input_image in output_images]
# random flip
if h_flip and random.randint(0, 1):
output_images = [cv2.flip(input_image, 1) for input_image in output_images]
if v_flip and random.randint(0, 1):
output_images = [cv2.flip(input_image, 0) for input_image in output_images]
factor = 1.0 + abs(random.gauss(mu=0.0, sigma=brightness))
if random.randint(0, 1):
factor = 1.0 / factor
table = np.array([((i / 255.0) ** factor) * 255 for i in np.arange(0, 256)]).astype(np.uint8)
output_images[0] = cv2.LUT(output_images[0], table)
if rotation:
angle = random.randint(0, rotation)
else:
angle = 0.0
if zoom:
scale = random.randint(50, zoom * 100) / 100
else:
scale = 1.0
# print(angle, scale)
if rotation or zoom:
for i, input_image in enumerate(output_images):
M = cv2.getRotationMatrix2D((input_image.shape[1] // 2, input_image.shape[0] // 2), angle, scale)
# M = cv2.getRotationMatrix2D((input_image.shape[1] // 2, input_image.shape[0] // 2), 45, 1)
output_images[i] = cv2.warpAffine(input_image, M, (input_image.shape[1], input_image.shape[0]))
# print('len of output %s' % len(output_images))
return [input_image.astype(np.uint8) for input_image in output_images]
@maksimovkonstantin looks great! Thx so much!
@maksimovkonstantin: Me too, I also got the 0.41 LB with left right, up down flip and Adam optimization. One more thing, do you use fixed dataset (made by Konstantin Lopuhin) to obtain 0.413 LB?
@keven4ever : What is optimization are you using? I am using Adam with 80 epochs with all
model.train(dataset_train, dataset_val,
learning_rate=1e-4,
epochs=80,
layers='all')
@John1231983 i still use SGD as it is the one used in paper. @John1231983 are you able to avoid overfitting when train all
with only flipping augmentation? Since this is exactly what i did, the only difference is optimiser.
I think I did not have it. Let see my log
This is my training schedule with Adam method
LEARNING_RATE=1e-4
model.train(dataset_train, dataset_val,
learning_rate=LEARNING_RATE,
epochs=40,
layers='all')
model.train(dataset_train, dataset_val,
learning_rate=LEARNING_RATE/10,
epochs=80,
layers="all")
model.train(dataset_train, dataset_val,
learning_rate=LEARNING_RATE/100,
epochs=120,
layers='all')
For above, I got 0.41 LB with the fixed dataset using resnet-50. Could you tell me what is base score did you achieve? Base score means use original mask-rcnn implementation.
@John1231983 my base score is 0.448 but as i mentioned, it is hard to reproduce however, i also managed to archive 0.44+ several times without train all network, of course i tune several parameters as mentioned in other thread
Great. I guess I miss some parameters. So I think you just change hyper-parameters and achieved 0.44+. Am I right? Do you train the network with different training input, such as gray input for one network, and color input for another network? This is my hyper-parameters setting. How about you?
USE_MINI_MASK = True
MINI_MASK_SHAPE = (56, 56)
GPU_COUNT = 1
IMAGES_PER_GPU = 2
bs = GPU_COUNT * IMAGES_PER_GPU
STEPS_PER_EPOCH = 600 // bs
VALIDATION_STEPS = 70 // bs
NUM_CLASSES = 1 + 1
IMAGE_MIN_DIM = 512
IMAGE_MAX_DIM = 512
IMAGE_PADDING = True
RPN_ANCHOR_SCALES = (8, 16, 32, 64, 128)
BACKBONE_STRIDES = [4, 8, 16, 32, 64]
RPN_TRAIN_ANCHORS_PER_IMAGE = 320 #300
POST_NMS_ROIS_TRAINING = 2000
POST_NMS_ROIS_INFERENCE = 2000
POOL_SIZE = 7
MASK_POOL_SIZE = 14
MASK_SHAPE = [28, 28]
TRAIN_ROIS_PER_IMAGE = 512
RPN_NMS_THRESHOLD = 0.7
MAX_GT_INSTANCES = 256
DETECTION_MAX_INSTANCES = 400
DETECTION_MIN_CONFIDENCE = 0.7
DETECTION_NMS_THRESHOLD = 0.3
MEAN_PIXEL = np.array([42.17746161,38.21568456,46.82167803])
WEIGHT_DECAY = 0.0001
@John1231983 correct! I think increase TRAIN_ROIS_PER_IMAGE
to 512 help me boots the performance a lot, before that i got around 0.414. Also i use original image instead of gray input.
I think you can boot more using the schemes: cluster training set into 3 sets: Train each set by mask rcnn, then you obtained 3 checkpoints. After that, apply each checkpoint for each cluster in the test set.
@John1231983 do you use augmentation or you get 0.44 with your above config on clear images?
@maksimovkonstantin : I just use a simple augmentation as left right and up down. I will use your augmentation. Thanks again. For above setting, I got 0.41. Only @keven4ever achieved 0.44, not me :(
@John1231983 i tried with three class approach (white, black and purple), but only in a single model, not get as high as 0.448, but maybe 0.43 or 0.44+, so no gain. I will try your approach after manage to get all network trained.
@maksimovkonstantin actually i got 0.448 with only flip l/r augmentation.
@maksimovkonstantin : I think your code somehow wrong because you have to rotation,filp both image and its masks,boxes. Your code only augment the image
@maksimovkonstantin @John1231983 i am still not fully convinced by zoom and crop based augmentation? For example, if we always crop 128x128 patch from original image, then to use mask rcnn, we still need to scale it up to sth like 512x512, this will always increase the size of cell during training, will model fail to predict small cells?
@keven4ever : For cropping, it only for making dataset larger. Actually, for semantic segmentation, we do not need to resize a fixed size as 512x512, so it may improve performance. For mask-rcnn, we have to use a fixed input as 512x512 or 1024x1024, so I guess it will not improve performance because we add many zero padding to image
@FruVirus I tried ResNet50 and train everything from scratch. With data augmentation, there is no overfitting pb any more, however, the mAP is still much worse than training only heads with ResNet101( pre-loaded coco weight). I think pre-loaded weight makes quite much of difference (i only have single GTX 1080, took two days to train ResNet50).
@keven4ever: if I understand correctly, you only train 'head' for coco weight , and did not train 'all' to achieve .43+ score. Am I right? If that, I guess you may need to train all when you see the overfitting. For ex, train all after 20 epoches.
@John1231983 that's correct!
Thanks. Could you provide your LB using resnet50 and train from scratch. I achieved .41 with resnet50 and imagenet pretrain, train all
, ignore training heads
Hey guys, anyone knows how to add focal loss?
@John1231983 I only got 0.376. Btw, where did you download the pre-trained imagenet rest50 weight?
@keven4ever : Too low. I got 0.41 with it. Now, I am using coco pretrain and hope it better.
FYI, this is the link to download pre-train models (resnet, inception...), but I used it and they provided the worst results than resnet50 https://github.com/fchollet/deep-learning-models/releases
This is my learning schedule. Do you use same as me?
model.train(dataset_train, dataset_val,
learning_rate=bowl_config.LEARNING_RATE/10,
epochs=10,
layers="heads")
model.train(dataset_train, dataset_val,
learning_rate=bowl_config.LEARNING_RATE / 10,
epochs=40,
layers="all")
model.train(dataset_train, dataset_val,
learning_rate=bowl_config.LEARNING_RATE / 100,
epochs=80,
layers="all")
@John1231983 @keven4ever I trained with SGD 0.001 100 epochs heads and 60 epochs 4+ using pretrained coco weights and ResNet101 backbone - it gives around 0.435 score, I think key to sucess is to train all only in the very last epochs.
@maksimovkonstantin : Very funny. I have changed many setting and find the way to obtain better but it looks that using default strategy give better performance. In summarize, could you confirm to us about your strategy like that?
model.train(dataset_train, dataset_val,
learning_rate=config.LEARNING_RATE,
epochs=100,
layers='heads')
# Training - Stage 2
# Finetune layers from ResNet stage 4 and up
print("Fine tune Resnet stage 4 and up")
model.train(dataset_train, dataset_val,
learning_rate=config.LEARNING_RATE,
epochs=60,
layers='4+')
# Training - Stage 3
# Fine tune all layers
print("Fine tune all layers")
model.train(dataset_train, dataset_val,
learning_rate=config.LEARNING_RATE / 10,
epochs=10,
layers='all')
@John1231983 exactly!) with config below `class BowlConfig(Config): NAME = "nucleos" GPU_COUNT = 2 IMAGES_PER_GPU = 1
NUM_CLASSES = 1 + 1 # background + 1 area
IMAGE_MIN_DIM = 256
IMAGE_MAX_DIM = 512
IMAGE_PADDING = True
RPN_ANCHOR_SCALES = (16, 32, 64, 128, 256) # anchor side in pixels
TRAIN_ROIS_PER_IMAGE = 1024
ROI_POSITIVE_RATIO = 0.33
STEPS_PER_EPOCH = 550 // (IMAGES_PER_GPU * GPU_COUNT)
VALIDATION_STEPS = 50 // (IMAGES_PER_GPU * GPU_COUNT)
MEAN_PIXEL = [43.53, 39.56, 48.22]
LEARNING_RATE = 1e-3
USE_MINI_MASK = True
MAX_GT_INSTANCES = 500
`
@maksimovkonstantin first of all thank you for share this interesting train schema, the purpose of this competition is to get hands dirty and gain some experience, i have to say what you shared did serve this purpose for me, thank you again!
@maksimovkonstantin @John1231983 you have shared different train schema and parameters, I wonder if you configuration/schema is re-producable? The reason I am asking is that, after got my best LB score, I tried to train it again either by continue with last epoch or start from epoch 0, never be able to get similar performance any more. This also happened to some other configurations I had. Also I tried different things which in theory should improve the performance, but in fact it just provided worse score. But I only tried with different things once, so I wonder if train it multiple time, maybe I will eventually get better score. Then this makes me think that such complicated network and so many hyper-parameters, maybe the result is not so re-producable. If this is the case, instead of trying different parameter and train schema just once, we shall stick to the configuration we believe and try several times. What do you guys think?
@keven4ever I also have the same issue with reproducibility, but I hope that last my scheme will be more stable.
One more thing I want to share that is convert images (color and gray) to same space like gray space. Then after obtain the result in inference, you can consider post processing that give me some gain. I think the challenge has many problem that deep learning may not handle , i.e different image space....
@maksimovkonstantin : In your function data_augmentation
, you will augment image data with random rotation, zoom...How about its masks? It must apply same the random number (scale, angle) for its masks to make consistency
@John1231983 it augments both masks and image, this function as input takes list of images, where first one is image and other are masks.
@maksimovkonstantin : Great to hear that. However, I used this function and it provides an error. This is my script
image=dataset_train.load_image(0)
masks, class_ids = dataset_train.load_mask(0)
#Image shape of (256, 320, 3) and masks shape of (256, 320, 73)
input_aug=data_augmentation([image, masks])
This is error
input_aug=data_augmentation([image, masks])
File "augmentation_data.py", line 46, in data_augmentation
output_images[i] = cv2.warpAffine(input_image, M, (input_image.shape[1], input_image.shape[0]))
cv2.error: /io/opencv/modules/imgproc/src/imgwarp.cpp:1825: error: (-215) ifunc != 0 in function remap
This is my opencv-python version
Python 3.5.2 (default, Nov 23 2017, 16:37:01)
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import cv2
>>> cv2.__version__
'3.3.1'
@John1231983 the list should be of images with shape 256 320 3, you should unpack stacked masks list to 73 mask images with 3 equal channels in each
@maksimovkonstantin : I have tried but it still errors. This is the shape of mask after I converted
masks_rgb_all=[]
for i in range(masks.shape[2]):
mask=masks[:,:,i]
masks_rgb = []
for i in range (3):
masks_rgb.append(mask)
masks_rgb = np.stack(masks_rgb, axis=-1)
masks_rgb_all.append(masks_rgb)
masks_rgb_all = np.stack(masks_rgb_all, axis=-1)
print (masks_rgb.shape,masks_rgb_all.shape)
input_aug=data_augmentation([image,masks_rgb_all])
(256, 320, 3) (256, 320, 3, 73) Error still
output_images[i] = cv2.warpAffine(input_image, M, (input_image.shape[1], input_image.shape[0]))
cv2.error: /io/opencv/modules/imgproc/src/imgwarp.cpp:1825: error: (-215) ifunc != 0 in function remap
@John1231983 @maksimovkonstantin i can confirm that the training schema starting with heads
then train all
does improve the performance. You can see the performance figure below. the lowest red line is the one i got highest LB score (0.448), then upper red line is training only heads
, then the green line shows when i train with all
after 84 epoch.
The difference with my best record is that this time i have used more augmentation (flipping l/r, flipping u/d, rotating 360 degree, brightness, but i still have not applied zooming and cropping), but it seems augmentation makes the model a little bit under fitting.
Also i only used SGD with lr 0.001. According to this: https://shaoanlu.wordpress.com/2017/05/29/sgd-all-which-one-is-the-best-optimizer-dogs-vs-cats-toy-experiment/, SGD usually can find better local optimal solution than adaptive optimizer like Adam.
@keven4ever I have very close loss chrarts, but I can't reach mask loss as you close to 0.1, i think config is the key.
@maksimovkonstantin i am not sure if config is the key since the only difference here is data augmentation, the best performance one only used flip l/r augmentation and only train heads
, the config is the same. So i am totally confused, in theory both augmentation and train entire network should improve the performance instead of reduce.
@keven4ever : As I know, we are working in pixel level, so scaling mask must be careful. As my experiment (I did not try augmentation), the post-processing is most important in this challenge
@keven4ever and @maksimovkonstantin : After training the dataset many time, I found the best way to achieve 0.44+ are
head
then train all
. Number of training head
biger than training all
fipud, fliplr
are enoughSGD
with clipnorm
. Adam
is faster but as @keven4ever mentioned, it difficult to achieve local optimalgray, color, HSV...
does not help improve performance. Just train the network with all types together. Do you agree these mentioned points? What your performance now @keven4ever ? Hope you can reproduce the LB with my above tips
@John1231983 based on experiments, it looks correct, however some of them doesn't really make sense, I suspect there is some special either in Mask R-CNN implementation or in dataset, for example:
I am not sure about the last bullet, I have not trie post processing like dilation nor CRF. The only post processing I did is to clean the masks overlapping, otherwise there is submission error.
@keven4ever : I think that the baseline maskrcnn using this repo achieved around 0.4+ LB. The performance also depends on the strategy of learning. What is your LB using COCO and head, all
training?
For me, the dilation (post-processing) improve my score from 0.4 to 0.43 LB. It still lower than the baseline of maskrcnn with pytorch implementation (0.5+ LB)
Hello,
I only have a small training set with about 670 labelled images and would like to further improve the accuracy by training entire backbone network instead of only heads. However, after about 30,40 epoch, the network suffer from overfitting already. ResNet already uses batch norm, so i wonder if there is sth else i can do to improve the situation? How about dropout? If i apply dropout, can i still load the pre-trainned resent weight from CoCo or Imagenet? Or some other technique? Thank you!