Training strategy - Githubissues

shikunyu8 commented 6 years ago

Hi,

First, I wanna say thank you for providing this great implementation of Mask_RCNN. I am using this repo for a while but can't figure out a proper training strategy, so I hope someone can give me some suggestions.

Image: CBIS-DDSM (X-ray images) training:981, validation:250. Roughly one instance per image( highly class imbalance).

A sample image and correspoing mask:

My Configurations: BACKBONE resnet101 BATCH_SIZE 2 DETECTION_MIN_CONFIDENCE 0.7 DETECTION_NMS_THRESHOLD 0.3 IMAGE_MAX_DIM 512 IMAGE_MIN_DIM 512 IMAGE_RESIZE_MODE square IMAGE_SHAPE [512 512 3] LEARNING_MOMENTUM 0.9 LEARNING_RATE 0.001 MEAN_PIXEL [53.129 53.129 53.129] ROI_POSITIVE_RATIO 0.33 RPN_ANCHOR_RATIOS [0.5, 1, 2] RPN_ANCHOR_SCALES (16, 32, 64, 128, 256) RPN_NMS_THRESHOLD 0.7 RPN_TRAIN_ANCHORS_PER_IMAGE 512 TRAIN_ROIS_PER_IMAGE 320 WEIGHT_DECAY 0.0001

Augmentation: augmentation = iaa.SomeOf((0, 2), [ iaa.Fliplr(0.5), iaa.Flipud(0.5),])

I read many issues in this repo, and I summarized several strategies that could possibly help me.

Use data augmentation.
Innit with coco weights.
Only train classifier without changing the backbone weights of coco(this is from Andrew Ng's onlie course, that he said if you only have several hundred images, only train the classifier is a good idea)
Try to generate more positive proposals.

Based on these suggestions, I did some incomplete research. But I haven't got nice predictions.

shikunyu8 commented 6 years ago

I tested four scenarios:( learning rate decreased by 10 after 20 epochs) orange: init with random weights, no augmentation, 20 epochs for heads and 40 for all layers. red: init with coco, with augmentation, 20 epochs for heads and 40 for all layers. dark blue: init with coco, no augmentation, 20 epochs for heads and 40 for all layers. light blue: init with coco, with augmentation, only train heads layer(should be only training classifier) for 60 epochs, u7ya0vsc1mpgv1ds pewxdw uam0l 4vpqc0619598_b rr

no situation shows val_loss lower than 1, any suggestions?

StanlyHardy commented 6 years ago

@shikunyu8 Can you try to reduce the anchor scale to (4, 8, 16, 32, 64) and train again?

shikunyu8 commented 6 years ago

@StanlyHardy That worth trying, I will do this. Should I init with coco weights? It seems that if I init with coco, the validation loss increases.

patrick-llgc commented 6 years ago

Another thing I would suggest trying is to tune the relative weights among the 5 losses. I would also try doing more augmentation than just flipping images.

shikunyu8 commented 6 years ago

@patrick-12sigma I tried anchor scale (4, 8, 16, 32, 64), (8, 16, 32, 64,128) and it performed worse than (16, 32, 64, 128, 256). I tried

 augmentation = iaa.SomeOf((0, 3), [
        iaa.Fliplr(0.5),
        iaa.Flipud(0.5),
        iaa.OneOf([iaa.Affine(rotate=90),
                   iaa.Affine(rotate=180),
                   iaa.Affine(rotate=270)]),
        iaa.Multiply((0.8, 1.5)),
        iaa.GaussianBlur(sigma=(0.0, 5.0))
    ])

but I didn't see much difference in terms of AP. I think the author of this implementation is using the same loss of the original Mask-rcnn paper, so perhaps it works fine. At least I don't know how to tune that.

shikunyu8 commented 6 years ago

The image size is around 5000*3000, I don't know whether using image resizing and mini-mask will cause severe accuracy loss.

patrick-llgc commented 6 years ago

@shikunyu8 There is the configuration that controls the relative weights of different losses here.

You mentioned a good point about the the impact of resizing the original image. It really depends on the statistical distribution of the size of the objects you would like to detect. I'd do some quick stats plot to determine if most of the objects to be detected are overwhelmingly small. Or plot the sizes of the false positives (missed GT) and see if they are all small objects.

shikunyu8 commented 6 years ago

@patrick-12sigma I did that statistical analysis and also tried all possible anchor scales, but the AP is not as good as expected(best is about 0.75). I think maybe it is because of over-fitting(I got 0.95 on test set). In mask-rcnn paper, they said: To reduce overfitting, as this training set is smaller, we train using image scales randomly sampled from [640, 800] pixels; inference is on a single scale of 800 pixels.

In this implementation, all images are in same size. I will try to add data augmentation to do this. like this:

augmentation = iaa.SomeOf((0, 3), [
    iaa.Fliplr(0.5),
    iaa.Flipud(0.5),
    iaa.OneOf([iaa.Affine(rotate=90),
               iaa.Affine(rotate=180),
               iaa.Affine(rotate=270)],
             ),
    iaa.Affine(scale={"x": (0.8, 1.2), "y": (0.8, 1.2)}),
    iaa.Multiply((0.8, 1.5)),
    iaa.GaussianBlur(sigma=(0.0, 5.0))
])

Thank you.

Omar-Aboelsoud commented 6 years ago

@shikunyu8 How can I reduce number of images trained/(batch_size) per epoch ?

patrickcgray commented 6 years ago

Hi @shikunyu8 I'm curious if you found any strategy to be particularly effective now that you've had some time to experiment?

shikunyu8 commented 6 years ago

@patrickcgray Hi, I did a lot of experiments and get roughly 85% recall. It's not great, but I guess that is close to the optimal parameter settings for my dataset. Several strategies worth noting:

Innit with coco and early cutoff. I observed a huge increase in recall when using coco pre-trained weight, so absolutely init with it. Since my dataset is small, training too many iterations will overfit on training set which will make the model not generalizable, and the validation loss increases after certain epoch. In my experiment, I picked the weight with optimal performance.
Select proper anchor scales. This affects model performance dramatically. You can use inspect_data.ipyn to check your data or just try several levels of scales.
Tune WEIGHT_DECAY. This affects L2 strength and is a good way of preventing overfitting. Please try 0.01, 0.005 and 0.001 first, then try more precisely.
Resnet 50 is worse. Someone would argue that Resnet 50 is less complex than Resnet 101, therefore worth trying when faced with an overfitting problem. But in my experiment, its performance is much worse than Resnet 101's. (I innit Resnet 50 with ImageNet weights not coco, which can be the true reason for worse performance. But I can't verify it, since I don't have Resnet 50 weights trained on coco. )
Only train classifiers can be a good idea. If your dataset size is limited, you can just train the classifiers, instead of changing the features in pre-trained weights. Or you can freeze some layers and train part of layers in Resnet like resnet4+. One example: head(20 epochs)---resnet4+(40 epochs)/10---all(60 epochs)(/10) . /10 means the learning rate is divided by 10, this helps the algorithm to converge easier.

Hope this helps.

patrickcgray commented 6 years ago

Hi @shikunyu8 thanks so much for all the info! I've altered the anchor scales and you're right the inspect_data.ipynb helps a lot. I'm now training training with different weight_decay values and will report back how they help! I've also changed the loss weights to be 1,2,2,2,5 to prioritize the MRCNN Mask and then lowered train_rois_per_image to 32 because I don't have many objects per image. Very curious how this will impact loss. It is so difficult to only test one variable at a time because it takes me ~36 hours to do a full training run. Your step 5 has also been very helpful.

Again, thanks for the insight!

javierfs commented 5 years ago

@patrick-12sigma I did that statistical analysis and also tried all possible anchor scales, but the AP is not as good as expected(best is about 0.75). I think maybe it is because of over-fitting(I got 0.95 on test set). In mask-rcnn paper, they said: To reduce overfitting, as this training set is smaller, we train using image scales randomly sampled from [640, 800] pixels; inference is on a single scale of 800 pixels.

In this implementation, all images are in same size. I will try to add data augmentation to do this. like this:
augmentation = iaa.SomeOf((0, 3), [
    iaa.Fliplr(0.5),
    iaa.Flipud(0.5),
    iaa.OneOf([iaa.Affine(rotate=90),
               iaa.Affine(rotate=180),
               iaa.Affine(rotate=270)],
             ),
    iaa.Affine(scale={"x": (0.8, 1.2), "y": (0.8, 1.2)}),
    iaa.Multiply((0.8, 1.5)),
    iaa.GaussianBlur(sigma=(0.0, 5.0))
])
Thank you.

How many times do you augment the data? I mean, I do not get how many images are created by using this code.

patrickcgray commented 5 years ago

Hi @javierfs I did some more comprehensive augmentation, and I'm only getting slightly worse on my training set than my validation set, with a total of 265 training images, my code was:

augmentation = iaa.Sometimes(.667, iaa.Sequential([
    iaa.Fliplr(0.5), # horizontal flips
    iaa.Crop(percent=(0, 0.1)), # random crops
    # Small gaussian blur with random sigma between 0 and 0.25.
    # But we only blur about 50% of all images.
    iaa.Sometimes(0.5,
        iaa.GaussianBlur(sigma=(0, 0.25))
    ),
    # Strengthen or weaken the contrast in each image.
    iaa.ContrastNormalization((0.75, 1.5)),
    # Add gaussian noise.
    # For 50% of all images, we sample the noise once per pixel.
    # For the other 50% of all images, we sample the noise per pixel AND
    # channel. This can change the color (not only brightness) of the
    # pixels.
    iaa.AdditiveGaussianNoise(loc=0, scale=(0.0, 0.05*255)),
    # Make some images brighter and some darker.
    # In 20% of all cases, we sample the multiplier once per channel,
    # which can end up changing the color of the images.
    iaa.Multiply((0.8, 1.2)),
    # Apply affine transformations to each image.
    # Scale/zoom them, translate/move them, rotate them and shear them.
    iaa.Affine(
        scale={"x": (0.8, 1.2), "y": (0.8, 1.2)},
        #translate_percent={"x": (-0.2, 0.2), "y": (-0.2, 0.2)},
        rotate=(-180, 180),
        #shear=(-8, 8)
    )
], random_order=True)) # apply augmenters in random order

190665688 commented 5 years ago

hello, i want to know how to only train classifiers? what's the command line?

190665688 commented 5 years ago

hello, i want to know how to only train classifiers? what's the command line? @shikunyu8 @patrickcgray

borislav-milkov commented 5 years ago

@190665688 Look at the color splash example. You can just train the top level of the network and use the coco pre-trained weights. model.train(dataset_train, dataset_val, learning_rate=config.LEARNING_RATE, epochs=20, augmentation = augmentation, layers='heads') notice the layers=heads

ChauncyFr commented 5 years ago

Hi @shikunyu8 thanks so much for all the info! I've altered the anchor scales and you're right the inspect_data.ipynb helps a lot. I'm now training training with different weight_decay values and will report back how they help! I've also changed the loss weights to be 1,2,2,2,5 to prioritize the MRCNN Mask and then lowered train_rois_per_image to 32 because I don't have many objects per image. Very curious how this will impact loss. It is so difficult to only test one variable at a time because it takes me ~36 hours to do a full training run. Your step 5 has also been very helpful.

Again, thanks for the insight!

Hello, I am very interested in what you said inspect_data.ipynb, how do you set anchor scales through this file? Can you describe how to operate it carefully? Thank you!

harshgrovr commented 5 years ago

@ChauncyFr did you find the solution?

banafsh89 commented 5 years ago

Dear @shikunyu8 Thanks for your comments on how to improve the network. I don't understand how to do your step 5? Would you please explain how should I change the code to be able to train for certain epochs on certain levels and with certain learning rate! Or maybe @patrickcgray can also help as I see he used your 5th step and said it was helpful for him.

banafsh89 commented 5 years ago

Hi @shikunyu8 thanks so much for all the info! I've altered the anchor scales and you're right the inspect_data.ipynb helps a lot. I'm now training training with different weight_decay values and will report back how they help! I've also changed the loss weights to be 1,2,2,2,5 to prioritize the MRCNN Mask and then lowered train_rois_per_image to 32 because I don't have many objects per image. Very curious how this will impact loss. It is so difficult to only test one variable at a time because it takes me ~36 hours to do a full training run. Your step 5 has also been very helpful. Again, thanks for the insight!

Hello, I am very interested in what you said inspect_data.ipynb, how do you set anchor scales through this file? Can you describe how to operate it carefully? Thank you!

you can inspect with that file how is the size of your boundary box objects and then based on that decide to how to choose your scales. You must change the scales in your main code that run for training (look at the config file the for the default scales).

banafsh89 commented 5 years ago

Dear @shikunyu8 Thanks for your comments on how to improve the network. I don't understand how to do your step 5? Would you please explain how should I change the code to be able to train for certain epochs on certain levels and with certain learning rate! Or maybe @patrickcgray can also help as I see he used your 5th step and said it was helpful for him.

I found the answer in issue #168. I put it here incase can help others. You can train in different stages using this example: `

print("Train network heads")
model.train(dataset_train, dataset_val,
            learning_rate=config.LEARNING_RATE,
            epochs=40,
            augmentation=augmentation,
            layers='heads')
# Finetune layers from ResNet stage 4 and up
print("Fine tune Resnet stage 4 and up")
model.train(dataset_train, dataset_val,
            learning_rate=config.LEARNING_RATE,
            epochs=120,
            layers='4+')

print("Train all layers")
model.train(dataset_train, dataset_val,
            learning_rate=config.LEARNING_RATE/10,
            epochs=300,
            augmentation=augmentation,
            layers='all')`

harshgrovr commented 5 years ago

@banafsh89 what is your batch size? and other configuration?

banafsh89 commented 5 years ago

@banafsh89 what is your batch size? and other configuration?

I don't have a desired result yet! my accuracy is 83% but I am aiming for 95%. My batch size is 15, backbone resnet101, and only 900 images for training+val. The rest of my configuration is the same as nucleus.py configuration in the samples.

yhc1994 commented 4 years ago

Hi @javierfs I did some more comprehensive augmentation, and I'm only getting slightly worse on my training set than my validation set, with a total of 265 training images, my code was:

augmentation = iaa.Sometimes(.667, iaa.Sequential([
    iaa.Fliplr(0.5), # horizontal flips
    iaa.Crop(percent=(0, 0.1)), # random crops
    # Small gaussian blur with random sigma between 0 and 0.25.
    # But we only blur about 50% of all images.
    iaa.Sometimes(0.5,
        iaa.GaussianBlur(sigma=(0, 0.25))
    ),
    # Strengthen or weaken the contrast in each image.
    iaa.ContrastNormalization((0.75, 1.5)),
    # Add gaussian noise.
    # For 50% of all images, we sample the noise once per pixel.
    # For the other 50% of all images, we sample the noise per pixel AND
    # channel. This can change the color (not only brightness) of the
    # pixels.
    iaa.AdditiveGaussianNoise(loc=0, scale=(0.0, 0.05*255)),
    # Make some images brighter and some darker.
    # In 20% of all cases, we sample the multiplier once per channel,
    # which can end up changing the color of the images.
    iaa.Multiply((0.8, 1.2)),
    # Apply affine transformations to each image.
    # Scale/zoom them, translate/move them, rotate them and shear them.
    iaa.Affine(
        scale={"x": (0.8, 1.2), "y": (0.8, 1.2)},
        #translate_percent={"x": (-0.2, 0.2), "y": (-0.2, 0.2)},
        rotate=(-180, 180),
        #shear=(-8, 8)
    )
], random_order=True)) # apply augmenters in random order

For augmentation, will the augmentation just apply to images or both images and mask annotations? For example, for iaa.Fliplr, I just flip the images, but how about the mask annotations which is saved in the json file?

banafsh89 commented 4 years ago

Hi @javierfs I did some more comprehensive augmentation, and I'm only getting slightly worse on my training set than my validation set, with a total of 265 training images, my code was:

augmentation = iaa.Sometimes(.667, iaa.Sequential([
    iaa.Fliplr(0.5), # horizontal flips
    iaa.Crop(percent=(0, 0.1)), # random crops
    # Small gaussian blur with random sigma between 0 and 0.25.
    # But we only blur about 50% of all images.
    iaa.Sometimes(0.5,
        iaa.GaussianBlur(sigma=(0, 0.25))
    ),
    # Strengthen or weaken the contrast in each image.
    iaa.ContrastNormalization((0.75, 1.5)),
    # Add gaussian noise.
    # For 50% of all images, we sample the noise once per pixel.
    # For the other 50% of all images, we sample the noise per pixel AND
    # channel. This can change the color (not only brightness) of the
    # pixels.
    iaa.AdditiveGaussianNoise(loc=0, scale=(0.0, 0.05*255)),
    # Make some images brighter and some darker.
    # In 20% of all cases, we sample the multiplier once per channel,
    # which can end up changing the color of the images.
    iaa.Multiply((0.8, 1.2)),
    # Apply affine transformations to each image.
    # Scale/zoom them, translate/move them, rotate them and shear them.
    iaa.Affine(
        scale={"x": (0.8, 1.2), "y": (0.8, 1.2)},
        #translate_percent={"x": (-0.2, 0.2), "y": (-0.2, 0.2)},
        rotate=(-180, 180),
        #shear=(-8, 8)
    )
], random_order=True)) # apply augmenters in random order

For augmentation, will the augmentation just apply to images or both images and mask annotations? For example, for iaa.Fliplr, I just flip the images, but how about the mask annotations which is saved in the json file?

It will be done automatically on mask too. No worries

yhc1994 commented 4 years ago

Hi @javierfs I did some more comprehensive augmentation, and I'm only getting slightly worse on my training set than my validation set, with a total of 265 training images, my code was:

augmentation = iaa.Sometimes(.667, iaa.Sequential([
    iaa.Fliplr(0.5), # horizontal flips
    iaa.Crop(percent=(0, 0.1)), # random crops
    # Small gaussian blur with random sigma between 0 and 0.25.
    # But we only blur about 50% of all images.
    iaa.Sometimes(0.5,
        iaa.GaussianBlur(sigma=(0, 0.25))
    ),
    # Strengthen or weaken the contrast in each image.
    iaa.ContrastNormalization((0.75, 1.5)),
    # Add gaussian noise.
    # For 50% of all images, we sample the noise once per pixel.
    # For the other 50% of all images, we sample the noise per pixel AND
    # channel. This can change the color (not only brightness) of the
    # pixels.
    iaa.AdditiveGaussianNoise(loc=0, scale=(0.0, 0.05*255)),
    # Make some images brighter and some darker.
    # In 20% of all cases, we sample the multiplier once per channel,
    # which can end up changing the color of the images.
    iaa.Multiply((0.8, 1.2)),
    # Apply affine transformations to each image.
    # Scale/zoom them, translate/move them, rotate them and shear them.
    iaa.Affine(
        scale={"x": (0.8, 1.2), "y": (0.8, 1.2)},
        #translate_percent={"x": (-0.2, 0.2), "y": (-0.2, 0.2)},
        rotate=(-180, 180),
        #shear=(-8, 8)
    )
], random_order=True)) # apply augmenters in random order

For augmentation, will the augmentation just apply to images or both images and mask annotations? For example, for iaa.Fliplr, I just flip the images, but how about the mask annotations which is saved in the json file?

It will be done automatically on mask too. No worries

Hi @banafsh89 , thanks for your reply. Another question is that do I need to change my batch size (STEPS_PER_EPOCH) after I did the augmentation. For example, my current augmentation code is: augmentation = iaa.OneOf([ iaa.Fliplr(0.5), iaa.Flipud(0.5), iaa.Affine(rotate=90), iaa.GaussianBlur(sigma=(0.0, 5.0)) ]) For my understanding, the augmentation will generate a augmented image (random choose fliplr, flipud, affine and GaussianBlur) for each training image, so if there are 200 original training images then the total training images are 200 original + 200 augmented = 400. So do I need to double my STEPS_PER_EPOCH?

banafsh89 commented 4 years ago

STEPS_PER_EPOCH

No you don't need to change it. Set it based on your training dataset.

umar98 commented 4 years ago

Apologies, very new to all this stuff. Just experimenting through, where should I put in the Augmentation code? I mean in model.py or config.py

Altimis commented 4 years ago

Apologies, very new to all this stuff. Just experimenting through, where should I put in the Augmentation code? I mean in model.py or config.py

You need to set it in your training script, like this :

    model.train(dataset_train, dataset_val,
                learning_rate=config.LEARNING_RATE,
                epochs=int(n_epochs),
                layers=layers,
                augmentation = imgaug.augmenters.Sequential([ 
                imgaug.augmenters.Affine(rotate=(-45, 45))]),
                class_weight = class_weights
               )

Altimis commented 4 years ago

@patrickcgray Hi, I did a lot of experiments and get roughly 85% recall. It's not great, but I guess that is close to the optimal parameter settings for my dataset. Several strategies worth noting:

Innit with coco and early cutoff. I observed huge increase of recall when using coco pre-trained weight, so absolutely init with it. Since my dataset is small, training too many iterations will overfit on training set which will make the model not generalizable, and the validation loss increases after certain epoch. In my experiment, I picked the weight with optimal performance.

Select proper anchor scales. This affect model performance dramatically. You can use inspect_data.ipyn to check your data or just try several levels of scales.

Tune WEIGHT_DECAY. This affect L2 strength and is a good way of preventing overfitting. Please try 0.01, 0.005 and 0.001 first, then try more precisely.

Resnet 50 is worse. Someone would argue that Resnet 50 is less complex than Resnet 101, therefore worth trying when faced with overfitting problem. But in my experiment, its performance is much worse than Resnet 101's. (I innit Resnet 50 with ImageNet weights not coco, which can be the true reason for worse performance. But I can't verify it, since I don't have Resnet 50 weights trained on coco. )

Only train classifiers can be a good idea. If your dataset size is limited, you can just train the classifiers, instead of changing the features in pre-trained weights. Or you can freeze some layers and train part of layers in Resnet like resnet4+. One example: head(20 epochs)---resnet4+(40 epochs)/10---all(60 epochs)(/10) . /10 means learning rate is divided by 10, this helps algorithm to converge easier.

Hope this help.

Kunyu Shi

Hey, thank you so much for this generous training strategy. Actually, I'm gonna for sure use this strategy, but I need some scientific explanation, here are some questions : 1- Why do we need to do 3 different training, the first one on head, seconde one on +4 resnet layers and the last one on all layers. 2- You are saying that I need to save the weights from the frist training and use them for the seconde training and so on ? 3- How can we choose the right data augmentation method for our data ? 4- When you said that training too many iterations will overfit on training data, do you mean by "iterations" steps_per_epoch ? If this is true, so we cant use steps_per_epoch = training set size //batch_size as a general definition ?

Please I need answers for these questions, thank you in advance.

kazzastic commented 4 years ago

could anyone please tell me, what does TRAIN_ROIS_PER_IMAGE does?

gizemtanriver commented 4 years ago

@kazzastic I believe it is the number of region of interests that the RPN proposes for every image. If you have a lot of objects in your images, you should keep it high. You can check out the config.py in model folder.

kazzastic commented 4 years ago

@kazzastic I believe it is the number of region of interests that the RPN proposes for every image. If you have a lot of objects in your images, you should keep it high. You can check out the config.py in model folder.

But I thought it was the job of RPN_NMS_THRESHOLD to increase or decrease the number of proposals generated during training.

Or is it that RPN_NMS_THRESHOLD generates and then the TRAIN_ROI_PER_IMAGE decides how many have to be fed to the mask head?

gizemtanriver commented 4 years ago

@kazzastic I believe it is the number of region of interests that the RPN proposes for every image. If you have a lot of objects in your images, you should keep it high. You can check out the config.py in model folder.

But I thought it was the job of RPN_NMS_THRESHOLD to increase or decrease the number of proposals generated during training.

Or is it that RPN_NMS_THRESHOLD generates and then the TRAIN_ROI_PER_IMAGE decides how many have to be fed to the mask head?

yes you are right, TRAIN_ROI_PER_IMAGE is how many ROI proposals you will feed to the mask head. RPN_NMS_THRESHOLD determines which proposals you keep during RPN training based on non-max suppression, so you can increase RPN_NMS_THRESHOLD to increase number of proposals.

kazzastic commented 4 years ago

@kazzastic I believe it is the number of region of interests that the RPN proposes for every image. If you have a lot of objects in your images, you should keep it high. You can check out the config.py in model folder.

But I thought it was the job of RPN_NMS_THRESHOLD to increase or decrease the number of proposals generated during training.

Or is it that RPN_NMS_THRESHOLD generates and then the TRAIN_ROI_PER_IMAGE decides how many have to be fed to the mask head?

yes you are right, TRAIN_ROI_PER_IMAGE is how many ROI proposals you will feed to the mask head. RPN_NMS_THRESHOLD determines which proposals you keep during RPN training based on non-max suppression, so you can increase RPN_NMS_THRESHOLD to increase number of proposals.

So what do you think is it always good to have a large value of TRAIN_ROI_PER_IMAGE ?

gizemtanriver commented 4 years ago

I had tried reducing it to 100 before but it worsened the loss. Training takes a bit longer but i would rather keep the value high.

alprn42 commented 3 years ago

I tested four scenarios:( learning rate decreased by 10 after 20 epochs) orange: init with random weights, no augmentation, 20 epochs for heads and 40 for all layers. red: init with coco, with augmentation, 20 epochs for heads and 40 for all layers. dark blue: init with coco, no augmentation, 20 epochs for heads and 40 for all layers. light blue: init with coco, with augmentation, only train heads layer(should be only training classifier) for 60 epochs,

no situation shows val_loss lower than 1, any suggestions?

Hello, How did you plot the graphs in maskrcnn. Which code did you use to get the values for val_loss and training loss to draw the graphs.

kthkpc commented 3 years ago

Dear @shikunyu8 Thanks for your comments on how to improve the network. I don't understand how to do your step 5? Would you please explain how should I change the code to be able to train for certain epochs on certain levels and with certain learning rate! Or maybe @patrickcgray can also help as I see he used your 5th step and said it was helpful for him.

I found the answer in issue #168. I put it here incase can help others. You can train in different stages using this example: `
print("Train network heads")
model.train(dataset_train, dataset_val,
            learning_rate=config.LEARNING_RATE,
            epochs=40,
            augmentation=augmentation,
            layers='heads')
# Finetune layers from ResNet stage 4 and up
print("Fine tune Resnet stage 4 and up")
model.train(dataset_train, dataset_val,
            learning_rate=config.LEARNING_RATE,
            epochs=120,
            layers='4+')

print("Train all layers")
model.train(dataset_train, dataset_val,
            learning_rate=config.LEARNING_RATE/10,
            epochs=300,
            augmentation=augmentation,
            layers='all')`

I've tried implementing this but whenever it finishes running the first stage of training, it reloads the configuration and then exits and stops. Did anyone have the same problem?

BishwaBS commented 3 years ago

Apologies, very new to all this stuff. Just experimenting through, where should I put in the Augmentation code? I mean in model.py or config.py

You need to set it in your training script, like this :
    model.train(dataset_train, dataset_val,
                learning_rate=config.LEARNING_RATE,
                epochs=int(n_epochs),
                layers=layers,
                augmentation = imgaug.augmenters.Sequential([ 
                imgaug.augmenters.Affine(rotate=(-45, 45))]),
                class_weight = class_weights
               )

@Altimis How do we assign class weights? The sample size for classes is imbalanced in my case and would like to set class weights.

Altimis commented 3 years ago

@DigitalPlantScience

Here is an example :

CLASS_WEIGHTS = {
    0:2500,
    1:200,
    2:4500
    }

def compute_class_weights(CLASS_WEIGHTS=CLASS_WEIGHTS):

    """
        retuen weighted classes
    """
    mean = np.array(list(CLASS_WEIGHTS.values())).mean() # sum_class_occurence / nb_classes
    max_weight = max(CLASS_WEIGHTS.values())
    CLASS_WEIGHTS.update((x, float(max_weight/(y))) for x, y in CLASS_WEIGHTS.items())
    CLASS_WEIGHTS=dict(sorted(CLASS_WEIGHTS.items()))

    return CLASS_WEIGHTS

class_weights = compute_class_weights(CLASS_WEIGHTS)

model.train(dataset_train, dataset_val,
                learning_rate=config.LEARNING_RATE,
                epochs=int(n_epochs),
                layers=layers,
                augmentation = imgaug.augmenters.Sequential([ 
                imgaug.augmenters.Affine(rotate=(-45, 45))]),
                class_weight = class_weights
               )

BishwaBS commented 3 years ago

@Altimis Thanks for the technique to compute class weights. I didn't know the model.train has a "class_weight" argument to pass on the value for. Just to make sure again, do we need to add/modify codes in model.py in order to make use of "class_weight" argument or is it available already for use?

ankitVP77 commented 3 years ago

@kazzastic I believe it is the number of region of interests that the RPN proposes for every image. If you have a lot of objects in your images, you should keep it high. You can check out the config.py in model folder.

But I thought it was the job of RPN_NMS_THRESHOLD to increase or decrease the number of proposals generated during training. Or is it that RPN_NMS_THRESHOLD generates and then the TRAIN_ROI_PER_IMAGE decides how many have to be fed to the mask head?

yes you are right, TRAIN_ROI_PER_IMAGE is how many ROI proposals you will feed to the mask head. RPN_NMS_THRESHOLD determines which proposals you keep during RPN training based on non-max suppression, so you can increase RPN_NMS_THRESHOLD to increase number of proposals.

Will it help to increase RPN_NMS_THRESHOLD if i have a lot of instances of an object in a single image? What other parameters should I change to train better?

BishwaBS commented 3 years ago

class_weight

@Altimis I tried adding class_weight argument in model.train but it throws following error. Any idea?

TypeError Traceback (most recent call last)

in () 9 #custom_callbacks=[checkpoint], 10 layers='heads', ---> 11 class_weight=CLASS_WEIGHTS) 12 # custom_callbacks=[mean_average_precision_callback]) 13 end_train = time.time() TypeError: train() got an unexpected keyword argument 'class_weight'

neelsen1994 commented 3 years ago

Dear @shikunyu8 Thanks for your comments on how to improve the network. I don't understand how to do your step 5? Would you please explain how should I change the code to be able to train for certain epochs on certain levels and with certain learning rate! Or maybe @patrickcgray can also help as I see he used your 5th step and said it was helpful for him.

I found the answer in issue #168. I put it here incase can help others. You can train in different stages using this example: `
print("Train network heads")
model.train(dataset_train, dataset_val,
            learning_rate=config.LEARNING_RATE,
            epochs=40,
            augmentation=augmentation,
            layers='heads')
# Finetune layers from ResNet stage 4 and up
print("Fine tune Resnet stage 4 and up")
model.train(dataset_train, dataset_val,
            learning_rate=config.LEARNING_RATE,
            epochs=120,
            layers='4+')

print("Train all layers")
model.train(dataset_train, dataset_val,
            learning_rate=config.LEARNING_RATE/10,
            epochs=300,
            augmentation=augmentation,
            layers='all')`
I've tried implementing this but whenever it finishes running the first stage of training, it reloads the configuration and then exits and stops. Did anyone have the same problem?

Yea. I get exactly the same problem. I didn't understand how @shikunyu8 did this. Can anybody explain it?

sohinimallick commented 3 years ago

@patrickcgray Hi, I did a lot of experiments and get roughly 85% recall. It's not great, but I guess that is close to the optimal parameter settings for my dataset. Several strategies worth noting:

Innit with coco and early cutoff. I observed huge increase of recall when using coco pre-trained weight, so absolutely init with it. Since my dataset is small, training too many iterations will overfit on training set which will make the model not generalizable, and the validation loss increases after certain epoch. In my experiment, I picked the weight with optimal performance.

Select proper anchor scales. This affect model performance dramatically. You can use inspect_data.ipyn to check your data or just try several levels of scales.

Tune WEIGHT_DECAY. This affect L2 strength and is a good way of preventing overfitting. Please try 0.01, 0.005 and 0.001 first, then try more precisely.

Resnet 50 is worse. Someone would argue that Resnet 50 is less complex than Resnet 101, therefore worth trying when faced with overfitting problem. But in my experiment, its performance is much worse than Resnet 101's. (I innit Resnet 50 with ImageNet weights not coco, which can be the true reason for worse performance. But I can't verify it, since I don't have Resnet 50 weights trained on coco. )

Only train classifiers can be a good idea. If your dataset size is limited, you can just train the classifiers, instead of changing the features in pre-trained weights. Or you can freeze some layers and train part of layers in Resnet like resnet4+. One example: head(20 epochs)---resnet4+(40 epochs)/10---all(60 epochs)(/10) . /10 means learning rate is divided by 10, this helps algorithm to converge easier.

Hope this help.

Kunyu Shi

@shikunyu8 I tried to implement no. 5, i.e. an lr decay at every stage, however the loss seems to be going up rather than down at the start of every stage. For example, if the first stage ends at 1.6, the second starts with 3.4, and then decreases from then onwards.

sohinimallick commented 3 years ago

Dear @shikunyu8 Thanks for your comments on how to improve the network. I don't understand how to do your step 5? Would you please explain how should I change the code to be able to train for certain epochs on certain levels and with certain learning rate! Or maybe @patrickcgray can also help as I see he used your 5th step and said it was helpful for him.

I found the answer in issue #168. I put it here incase can help others. You can train in different stages using this example: `
print("Train network heads")
model.train(dataset_train, dataset_val,
            learning_rate=config.LEARNING_RATE,
            epochs=40,
            augmentation=augmentation,
            layers='heads')
# Finetune layers from ResNet stage 4 and up
print("Fine tune Resnet stage 4 and up")
model.train(dataset_train, dataset_val,
            learning_rate=config.LEARNING_RATE,
            epochs=120,
            layers='4+')

print("Train all layers")
model.train(dataset_train, dataset_val,
            learning_rate=config.LEARNING_RATE/10,
            epochs=300,
            augmentation=augmentation,
            layers='all')`
I've tried implementing this but whenever it finishes running the first stage of training, it reloads the configuration and then exits and stops. Did anyone have the same problem?
Yea. I get exactly the same problem. I didn't understand how @shikunyu8 did this. Can anybody explain it?

@ nielsen1994...I had faced this problem when I was adding the no.of epochs wrong. For example, if I wanted to train the first 20 epochs with heads, and the next 20 with 4+, the first training stage should have epochs=20, and the second epochs =40 (not 20). It has to be higher than the previous stage. Perhaps that is what is happening?

ankitVP77 commented 3 years ago

@patrickcgray Hi, I did a lot of experiments and get roughly 85% recall. It's not great, but I guess that is close to the optimal parameter settings for my dataset. Several strategies worth noting:

Innit with coco and early cutoff. I observed huge increase of recall when using coco pre-trained weight, so absolutely init with it. Since my dataset is small, training too many iterations will overfit on training set which will make the model not generalizable, and the validation loss increases after certain epoch. In my experiment, I picked the weight with optimal performance.

Select proper anchor scales. This affect model performance dramatically. You can use inspect_data.ipyn to check your data or just try several levels of scales.

Tune WEIGHT_DECAY. This affect L2 strength and is a good way of preventing overfitting. Please try 0.01, 0.005 and 0.001 first, then try more precisely.

Resnet 50 is worse. Someone would argue that Resnet 50 is less complex than Resnet 101, therefore worth trying when faced with overfitting problem. But in my experiment, its performance is much worse than Resnet 101's. (I innit Resnet 50 with ImageNet weights not coco, which can be the true reason for worse performance. But I can't verify it, since I don't have Resnet 50 weights trained on coco. )

Only train classifiers can be a good idea. If your dataset size is limited, you can just train the classifiers, instead of changing the features in pre-trained weights. Or you can freeze some layers and train part of layers in Resnet like resnet4+. One example: head(20 epochs)---resnet4+(40 epochs)/10---all(60 epochs)(/10) . /10 means learning rate is divided by 10, this helps algorithm to converge easier.

Hope this help. Kunyu Shi

@shikunyu8 I tried to implement no. 5, i.e. an lr decay at every stage, however the loss seems to be going up rather than down at the start of every stage. For example, if the first stage ends at 1.6, the second starts with 3.4, and then decreases from then onwards.

@sohinimallick Yes this is common and happened to me to. I am not 100% sure but that probably happens because at each stage more and more of the network participates in the training process (1st step only the FPN & heads, 2nd stage 4+ resnet, 3rd stage full 101 layers) and that causes the loss to cumulate at the start of the training. The new portion of the network that participates in each successive stage in not trained and needs to be fine-tuned. The rise in error probably arises from there. I am not sure about this. Do tell me if you find a better explanation.

Gulshan-gaur commented 3 years ago

Training strategy Hi ,I am facing issue regarding training that i have only 1100 images with 9 classes , I train all layers with coco weights. but the accuracy is not good as that i want. i think it happens bacause of less data. i try augumentation while training but again i got same results

xxxming730 commented 2 years ago

@patrickcgray Hi, I did a lot of experiments and get roughly 85% recall. It's not great, but I guess that is close to the optimal parameter settings for my dataset. Several strategies worth noting:

Innit with coco and early cutoff. I observed huge increase of recall when using coco pre-trained weight, so absolutely init with it. Since my dataset is small, training too many iterations will overfit on training set which will make the model not generalizable, and the validation loss increases after certain epoch. In my experiment, I picked the weight with optimal performance.

Select proper anchor scales. This affect model performance dramatically. You can use inspect_data.ipyn to check your data or just try several levels of scales.

Tune WEIGHT_DECAY. This affect L2 strength and is a good way of preventing overfitting. Please try 0.01, 0.005 and 0.001 first, then try more precisely.

Resnet 50 is worse. Someone would argue that Resnet 50 is less complex than Resnet 101, therefore worth trying when faced with overfitting problem. But in my experiment, its performance is much worse than Resnet 101's. (I innit Resnet 50 with ImageNet weights not coco, which can be the true reason for worse performance. But I can't verify it, since I don't have Resnet 50 weights trained on coco. )

Only train classifiers can be a good idea. If your dataset size is limited, you can just train the classifiers, instead of changing the features in pre-trained weights. Or you can freeze some layers and train part of layers in Resnet like resnet4+. One example: head(20 epochs)---resnet4+(40 epochs)/10---all(60 epochs)(/10) . /10 means learning rate is divided by 10, this helps algorithm to converge easier.

Hope this help. Kunyu Shi

@shikunyu8 I tried to implement no. 5, i.e. an lr decay at every stage, however the loss seems to be going up rather than down at the start of every stage. For example, if the first stage ends at 1.6, the second starts with 3.4, and then decreases from then onwards.

@sohinimallick Yes this is common and happened to me to. I am not 100% sure but that probably happens because at each stage more and more of the network participates in the training process (1st step only the FPN & heads, 2nd stage 4+ resnet, 3rd stage full 101 layers) and that causes the loss to cumulate at the start of the training. The new portion of the network that participates in each successive stage in not trained and needs to be fine-tuned. The rise in error probably arises from there. I am not sure about this. Do tell me if you find a better explanation.

Hello, have you found the problem?

matterport / Mask_RCNN

Training strategy #527