Open keven4ever opened 6 years ago
I'm thinking that one of the major difference is the choice of weighted focal loss used by the torch version of mask-rcnn.
@paulcx : Thanks for your information. Could you tell me which losses have been replaced by the weighted focal loss? I want to modify the repo to check the efficient.
# Losses
rpn_class_loss = KL.Lambda(lambda x: rpn_class_loss_graph(*x), name="rpn_class_loss")(
[input_rpn_match, rpn_class_logits])
rpn_bbox_loss = KL.Lambda(lambda x: rpn_bbox_loss_graph(config, *x), name="rpn_bbox_loss")(
[input_rpn_bbox, input_rpn_match, rpn_bbox])
class_loss = KL.Lambda(lambda x: mrcnn_class_loss_graph(*x), name="mrcnn_class_loss")(
[target_class_ids, mrcnn_class_logits, active_class_ids])
bbox_loss = KL.Lambda(lambda x: mrcnn_bbox_loss_graph(*x), name="mrcnn_bbox_loss")(
[target_bbox, target_class_ids, mrcnn_bbox])
mask_loss = KL.Lambda(lambda x: mrcnn_mask_loss_graph(*x), name="mrcnn_mask_loss")(
[target_mask, target_class_ids, mrcnn_mask])
It's the rpn class loss and pytorch version replace the smooth_l1 with the weighted_smooth_l1 as well.
@John1231983 very interesting, could you pls elaberate a little bit more how you did dilation? Only apply binary dilation on each predicated mask instance? What kind of kernel did you use? As i understood, binary dilation operation just enlarge the area of foreground(in this case, the mask), right? why does this improve the performance? Also how to handle in case two masks are overlapping, which one should be dilated? Thx
This is my code. You can try it and let me know how much improve LB using it.
from skimage.morphology import binary_dilation
def refineMasks(mask):
return binary_dilation(mask, disk(1))
#Run the refine masks
for i in range(predicts.shape[2]-1):
predicts[:,:,i] = refineMasks(predicts[:,:,i])
@John1231983 thx for sharing the code! Btw, i can confirm that there is sth we missed for data augmentation. In my config, the result using flip l/r rotating is for sure better than using other augmentation. I checked the code again, still could not find out why, like image shape in image_meta, etc
@John1231983 hmm, i applied the dilation on top of my best model (lb score 0.448), the result is 0.412. So it seems the same optimisation doesn't apply to everyone, at least in this specific case. Thank you anyway!
I think so. It is very difficult to reproduce the result. For now, I think it is better to use focal lost as pytorch version. The author of pytorch shows the baseline maskrcnn is 0.5 which is so far from our baseline
Sure, keep us updated in case you get a boost. I will continue to figure out why other data augmentation doesn't help.
One more thing, Do you try other augmentation likes flipud
and rot90
? Because you said that only fliplr
provided best performance
@John1231983 yep, i followed @maksimovkonstantin 's code, also introduced brightness augmentation
factor = 1.0 + abs(random.gauss(mu=0.0, sigma=brightness))
if random.randint(0, 1):
factor = 1.0 / factor
table = np.array([((i / 255.0) ** factor) * 255 for i in np.arange(0, 256)]).astype(np.uint8)
output_images[0] = cv2.LUT(output_images[0], table)
How you use the function? I have tried but it cannot call for masks input. My masks input is WxHxnum_mask
ok, here is my code:
def data_augmentation(input_image, masks,
h_flip=True,
v_flip=True,
rotation=360,
zoom=1.5,
brightness=0.5,
crop=False):
# first is input all other are output
# Data augmentation
output_image = input_image.copy()
output_masks = masks.copy()
# random crop
# if crop and random.randint(0, 1):
# h, w, c = output_images[0].shape
# upper_h, new_h, upper_w, new_w = locs_for_random_crop(h, w)
# output_images = [input_image[upper_h:upper_h + new_h, upper_w:upper_w + new_w, :] for input_image in output_images]
# random flip
if h_flip and random.randint(0, 1):
output_image = np.fliplr(output_image)
output_masks = np.fliplr(output_masks)
if v_flip and random.randint(0, 1):
output_image = np.flipud(output_image)
output_masks = np.flipud(output_masks)
factor = 1.0 + abs(random.gauss(mu=0.0, sigma=brightness))
if random.randint(0, 1):
factor = 1.0 / factor
table = np.array([((i / 255.0) ** factor) * 255 for i in np.arange(0, 256)]).astype(np.uint8)
output_image = cv2.LUT(output_image, table)
if rotation:
rotate_times = random.randint(0, rotation/90)
else:
rotate_times = 0.0
for r in range(0, rotate_times):
output_image = np.rot90(output_image)
output_masks = np.rot90(output_masks)
# if zoom:
# scale = random.randint(50, zoom * 100) / 100
# else:
# scale = 1.0
# # print(angle, scale)
# if rotation or zoom:
# for i, input_image in enumerate(output_images):
# M = cv2.getRotationMatrix2D((input_image.shape[1] // 2, input_image.shape[0] // 2), angle, scale)
# # M = cv2.getRotationMatrix2D((input_image.shape[1] // 2, input_image.shape[0] // 2), 45, 1)
# output_images[i] = cv2.warpAffine(input_image, M, (input_image.shape[1], input_image.shape[0]))
# # print('len of output %s' % len(output_images))
return output_image, output_masks
you just called it with data_augmentation(original_image, original_masks)
Thanks. I will try and let you know in my case. I know why you can use the code because you have commented the scale case. I have not the success with scale case.
@keven4ever, @John1231983 , @maksimovkonstantin
Just wanted to say that this thread has been very useful in terms of my own training. Lots of good things learned from reading what you guys have tried/done!
@FruVirus you are welcome. In case you are also on DSB2018 competition, could you pls share which score do you get?
@keven4ever
I am not on the DSB competition and unfortunately, I can't share that many details on my current work =/
@FruVirus no pb! Also learnt a lot from your tips and this is a great community, good luck with your work!
@keven4ever: do you have any improvement about your LB? Now I move to pytorch that train more faster and have more pretrained model. I will let you know if it helps the score improvement. Now I got 0.42 using this pytorch version of Heng.
Hello @John1231983, you can also try to use Keras for data augmentation. Here is the docs: https://keras.io/preprocessing/image/
x2 with @FruVirus . This thread has been very useful and worth to read. I just want to add a few things:
Did you check this repo: https://github.com/aleju/imgaug . Maybe you can try with more complex augmentations. However, remember to check that you still can see the target after processing. For example, smoothing with too high value will make your target disappear and augmentation will be not be helpful, it will be against you contaminating the data.
I haven't see too much (not sure if I skipped those comments) talk about image processing. You can see in https://www.kaggle.com/c/data-science-bowl-2018/discussion/48130#282959 that image processing also helps a lot to get better results. Just as a suggestions, maybe you can try some image processing methods (some pre-processing before feed the network and some image quality enhancement techniques before inference) Searching for the best training parameters to create a more robust model is very important. However, I believe an enhanced input image will leads to better inference results.
I am not on the DSB competition but I though I could share a few of my thoughts with you. Perhaps one of my lines could be useful for your work and lead us to a further conversation about how to improve the results using Mask RCNN in any kind of instance segmentation task.
@Hatuw I tried Keras's image generator, the challenge is that for masks, i can't use vectorized approach, instead have to for loop each mask one by one to do augmentation, this makes training quite slow. Have you found some better way?
Hi @keven4ever use can use vectorized implementation as implemented in this kernel https://www.kaggle.com/hexietufts/easy-to-use-keras-imagedatagenerator
@John1231983 thank you for asking! I actually had some progress, now 0.46+. Some findings:
all
network and use both random crop and flipping, in the end the result is better. So i think i shall increase the size of validation set and also manually pick images as validation set. @keven4ever : Good job. It is close to my LB. I suggest you can increase your LB by using external dataset. Some of dataset provided similar task as the challenge. I am using the dataset https://www.kaggle.com/voglinio/external-h-e-data-with-mask-annotations and it increases 0.03LB. Combined with mosaics image, I hope it can achieve 0.48 LB as a baseline. Hope the tips help you. Now, my score is 0.473 using pytorch code because of speed training.
@keven4ever
Sorry about that I am so busy these days.
I try to use the image generator in load_image_gt
function, but it will make the training slow down. I think that generate some image before training is better.
I haven'd paid attention to this challenge for some days. If you have some proposal, welcome to contact me and discuss together.
Thanks!
Good discussion about image augmentation here. I just pushed an update to support imgaug
augmentations out of the box, by passing an augmentation object to the train()
function.
http://imgaug.readthedocs.io/en/latest/source/augmenters.html
Thanks waleedka for this pr. I think we have to add one more condition in load_image_gt to pass cropped images which have zero masks in the case number of image per gpu is 1, otherwise it will feed zero masks to network and it has nan loss. For doing it, I think we add while() with condition number of mask is bigger than 0, if not, we will try to crop another position. How do you think that?
@waleedka : In your train () function, it only support fliplr. How about add more option likes scale, rotation?
augmentation = imgaug.augmenters.Fliplr(0.5)
Does it likes?
augmentation = imgaug.augmenters.Sequential([
imgaug.augmenters.Fliplr(0.5), # horizontally flip 50% of the images
imgaug.augmenters.Flipud(0.5), # horizontally flip 50% of the images
sometimes(iaa.CropAndPad(
percent=(-0.05, 0.1),
pad_mode=ia.ALL,
pad_cval=(0, 255)
)),
sometimes(iaa.Affine(
scale={"x": (0.8, 1.2), "y": (0.8, 1.2)}, # scale images to 80-120% of their size, individually per axis
translate_percent={"x": (-0.2, 0.2), "y": (-0.2, 0.2)}, # translate by -20 to +20 percent (per axis)
rotate=(-45, 45), # rotate by -45 to +45 degrees
shear=(-16, 16), # shear by -16 to +16 degrees
order=[0, 1], # use nearest neighbour or bilinear interpolation (fast)
cval=(0, 255), # if mode is constant, use a cval between 0 and 255
mode=ia.ALL # use any of scikit-image's warping modes (see 2nd image from the top for examples)
)),
])
@John1231983 The train()
function supports all the augmentations that imgaug
offers, so yes, just pass that big augmentation sequence to train()
and it should work.
The code applies the same augmentations to both, images and masks, and it already knows that some augmentations apply to images only and not to masks (like changing color channels or adding Gaussian noise). But, with that said, even augmentations that are safe for masks sometimes have options that make them unsafe, so always test your augmentations on both images and masks before training.
And, thanks for the tip about images with no masks. I'll look into it.
@John1231983 Hi, John. I have tested random_crop, my score drops from 0.440 to 0.424. Here is my code. Is there something wrong?height=512 width=512 if image.shape[0]>=height&image.shape[1]>=width: if random.randint(0,1): image, mask=randomCrop(image,mask,width,height)
My learning schedule is 50epochs all(1e-4) 25epochs all(1e-5). Can you help me?
@waleedka : Thanks for your reply. I used new PR and I got the error
Epoch 1/60
28/435 [>.............................] - ETA: 5:35 - loss: 4.6554 - rpn_class_loss: 0.2579 - rpn_bbox_loss: 1.9744 - mrcnn_class_loss: 0.0893 - mrcnn_bbox_loss: 1.7575 - mrcnn_mask_loss: 0.5764Traceback (most recent call last):
File "train.py", line 72, in <module>
augmentation=augmentation)
File "/home/john/mask_rcnn/model.py", line 2300, in train
use_multiprocessing=True,
File "/home/john/anaconda3/lib/python3.6/site-packages/keras/legacy/interfaces.py", line 91, in wrapper
return func(*args, **kwargs)
File "/home/john/anaconda3/lib/python3.6/site-packages/keras/engine/training.py", line 2192, in fit_generator
generator_output = next(output_generator)
File "/home/john/anaconda3/lib/python3.6/site-packages/keras/utils/data_utils.py", line 785, in get
raise StopIteration()
StopIteration
I have no error before use the new PR. How can I fix it? I detect that the bug is somewhere in the file utils.py that you have updated @cccmdls : Your code is correct. But I will crop if it is bigger than 512 other wise using resize function. Let me know your LB with this one. I am using mosaics
@John1231983 there is a problem that if the image is bigger than 512 and random.randint(0,1)=0, then it will not crop this image, how do you do that?resize or crop again? Currently I am training a model that without random.randint(0,1)
. I want to see what happens in this situation.
@John1231983 I couldn't reproduce the error you mentioned. I tested on the train_shapes notebook and used the big augmentation you listed above and it worked. You might want to track that issue in your code. If you confirm that it's indeed a bug, please provide more details.
@John1231983 so sad only got 0.410. LR=1e-4 50all(LR)+25all(LR/10) using mosaics. coco pretrained model. Test on the stage1_test. I don't know how to split the result based on mosaics_test to stage_test csv file. Can you help me?
@waleedka: I think someone who have same error as me provided to you in other thread. I think it may be similar. @ccmdls: i did not test on mosaic testing set. I only train on mosaic training set and test on original image. First, i will random crop 512x512 if the size of image is bigger than 512 ( not using prob crop), otherwise resize image to 512x512. I trained with 60 epochs on heads and 40 epoch on alls with learning rate 0.0001 using Adam. I don't know why someone success to train with SGD( i used sgd but got 0.44). Using above suggestion, you may got 0.47 (no post)~0.49lb (with post processing)
@John1231983 Hi,John.Thanks for your advices, But I only got 0.380,0.370,0.383,0.377 without any post processing.I can't reproduce your result. So sorry. Here is my config file. Can you give me some advices? LEARNING_RATE = 1e-4 USE_MINI_MASK = True MINI_MASK_SHAPE = (56, 56) STEPS_PER_EPOCH = 392 VALIDATION_STEPS = 44 IMAGE_MIN_DIM = 512 IMAGE_MAX_DIM = 512 RPN_ANCHOR_SCALES = (8, 16, 32, 64, 128) # anchor side in pixels, maybe add a 256? BACKBONE_STRIDES = [4, 8, 16, 32, 64] RPN_TRAIN_ANCHORS_PER_IMAGE = 320 #320 POST_NMS_ROIS_TRAINING = 2000 POST_NMS_ROIS_INFERENCE = 2000
POOL_SIZE = 7
MASK_POOL_SIZE = 14
MASK_SHAPE = [28, 28]
TRAIN_ROIS_PER_IMAGE = 512
RPN_NMS_THRESHOLD = 0.7
MAX_GT_INSTANCES = 600 #512
DETECTION_MAX_INSTANCES = 600 #400
DETECTION_MIN_CONFIDENCE = 0.7 # may be smaller?
DETECTION_NMS_THRESHOLD = 0.3 # 0.3
MEAN_PIXEL = np.array([31.92144429,29.84380259,34.66032842]) #mosaics
WEIGHT_DECAY = 0.0001
and here is my code about random crop.
height=512 width=512 if image.shape[0]>height&image.shape[1]>width: image, mask=randomCrop(image,mask,width,height) else: image, window, scale, padding = utils.resize_image( image, min_dim=config.IMAGE_MIN_DIM, max_dim=config.IMAGE_MAX_DIM, padding=config.IMAGE_PADDING) mask = utils.resize_mask(mask, scale, padding)
def randomCrop(img, mask, width, height): x = random.randint(0, img.shape[1] - width) y = random.randint(0, img.shape[0] - height) img = img[y:y+height, x:x+width] mask = mask[y:y+height, x:x+width] return img, mask
Now that the competition is over, who knows a public github repo with the best score using mask rcnn?
The best result reported in kaggle https://www.kaggle.com/c/data-science-bowl-2018/discussion/54089 for @waleedka was 0.476 (I know this was just a baseline). I wonder if someone got higher score using matterport's Mask RCNN.
I am reading the top solution review https://www.kaggle.com/c/data-science-bowl-2018/discussion/54741 and they used Unet. However, I am interested to know who got the best result using the matterport's script. According to the article, the preprocessing of the masks, the correct augmentations and the 2nd level model played a crucial part in the good accuracy of their solution.
It will be interested to reproduce their key steps and change the unet for mask rcnn and comparte the results under similar pre/ post processing since they mentioned that they didn't try the mask rcnn for the competition.
@keven4ever @John1231983 Do you think using the same mask processing of the winner solution, mask rcnn could do as good as the winner?
Update: In this link here was reported ZhengLi as the highest score using Mask RCNN and his solution is here
Why nobody tune WEIGHT_DECAY? I think it's a hyper parameter that will affect L2 strength.
I think so. It is very difficult to reproduce the result. For now, I think it is better to use focal lost as pytorch version. The author of pytorch shows the baseline maskrcnn is 0.5 which is so far from our baseline
@paulcx Hi, can you point me where is the pytorch version? Thanks!
Can anyone tell me how to add codes to print the training accuracy and validation accuracy in every epoch since we want to check the model is overfitting or not?
@lunasdejavu , you can check the val_loss to know if your model overfitting or not.
@keven4ever @John1231983 @maksimovkonstantin Hello guys. Thank you so much for this discussion, that was like the best discussion I've ever read in github. I'm new in this field and I'm actually working on a project using Matterport implementation of Mask RCNN. I understood almost every technique that you mentioned in this section, but I'm confused using the training technique that you used for this competition. For example, training 20 epochs on heads and 80 epochs on all layers means that we are going to use the weights (model artifacts) generated by the first training (on heads) and use them to train on all layers ? Thank you in advance.
@keven4ever
With such a small dataset, it is unlikely that BN or dropout will help. Also, BN with dropout is probably not a good idea (see paper on BN) and I don't think you can apply dropout with the pre-trained ResNet weights since that model didn't train using dropout in the first place.
The model capacity of ResNet-101 might be too large for your dataset. While it's true that ResNet enables deeper networks to converge compared to their plain counterparts, there is still a limit on the number of layers that can be incorporated in a ResNet before convergence suffers. For example, Table 6 in the ResNet paper shows that the classification error on CIFAR-10 decreases with increasing number of layers in ResNet up until ResNet-1202. ResNet-1202 actually performs worse than ResNet-32.
To prevent overfitting, you can try:
- Getting a larger dataset (but this is probably not feasible, otherwise you would've done this already)
- Stronger weight decay (i.e., L2 regularization)
- Lower model capacity (e.g., ResNet-50 or even ResNet-32)
- k-fold cross-validation
Hello @FruVirus can you please say how to apply k-fold cross-validation for mask-rcnn matterport repo ...??? Thanks in Advance.
Hello everyone, can someone provide the information on how to check whether the model is overfitting or not?
Hello everyone, can someone provide the information on how to check whether the model is overfitting or not?
By looking at your val_loss if it stops decreasing. You can apply early stopping to stop the training when the model starts to overfit (depending on how much epochs the val_loss has stopped decreasing)
Hello everyone, can someone provide the information on how to check whether the model is overfitting or not?
By looking at your val_loss if it stops decreasing. You can apply early stopping to stop the training when the model starts to overfit (depending on how much epochs the val_loss has stopped decreasing)
Hi @Altimis, but if the val_loss is fluctuating ??? why does the val_loss fluctuate?
@rupa1118 this repo has an example of how to use K fold cross-validation. you can use the built-in sklearn KFold methods
Just a simple question! every model that I saw for custom data used a json file for labelling but in your code for augmentation you use only mask images. Are you convert mask images to json files after augmentation? @maksimovkonstantin @keven4ever @John1231983
Hello,
I only have a small training set with about 670 labelled images and would like to further improve the accuracy by training entire backbone network instead of only heads. However, after about 30,40 epoch, the network suffer from overfitting already. ResNet already uses batch norm, so i wonder if there is sth else i can do to improve the situation? How about dropout? If i apply dropout, can i still load the pre-trainned resent weight from CoCo or Imagenet? Or some other technique? Thank you!