Open shikunyu8 opened 6 years ago
I tested four scenarios:( learning rate decreased by 10 after 20 epochs) orange: init with random weights, no augmentation, 20 epochs for heads and 40 for all layers. red: init with coco, with augmentation, 20 epochs for heads and 40 for all layers. dark blue: init with coco, no augmentation, 20 epochs for heads and 40 for all layers. light blue: init with coco, with augmentation, only train heads layer(should be only training classifier) for 60 epochs,
no situation shows val_loss lower than 1, any suggestions?
@shikunyu8 Can you try to reduce the anchor scale to (4, 8, 16, 32, 64) and train again?
@StanlyHardy That worth trying, I will do this. Should I init with coco weights? It seems that if I init with coco, the validation loss increases.
Another thing I would suggest trying is to tune the relative weights among the 5 losses. I would also try doing more augmentation than just flipping images.
@patrick-12sigma I tried anchor scale (4, 8, 16, 32, 64), (8, 16, 32, 64,128) and it performed worse than (16, 32, 64, 128, 256). I tried
augmentation = iaa.SomeOf((0, 3), [
iaa.Fliplr(0.5),
iaa.Flipud(0.5),
iaa.OneOf([iaa.Affine(rotate=90),
iaa.Affine(rotate=180),
iaa.Affine(rotate=270)]),
iaa.Multiply((0.8, 1.5)),
iaa.GaussianBlur(sigma=(0.0, 5.0))
])
but I didn't see much difference in terms of AP. I think the author of this implementation is using the same loss of the original Mask-rcnn paper, so perhaps it works fine. At least I don't know how to tune that.
The image size is around 5000*3000, I don't know whether using image resizing and mini-mask will cause severe accuracy loss.
@shikunyu8 There is the configuration that controls the relative weights of different losses here.
You mentioned a good point about the the impact of resizing the original image. It really depends on the statistical distribution of the size of the objects you would like to detect. I'd do some quick stats plot to determine if most of the objects to be detected are overwhelmingly small. Or plot the sizes of the false positives (missed GT) and see if they are all small objects.
@patrick-12sigma I did that statistical analysis and also tried all possible anchor scales, but the AP is not as good as expected(best is about 0.75). I think maybe it is because of over-fitting(I got 0.95 on test set). In mask-rcnn paper, they said: To reduce overfitting, as this training set is smaller, we train using image scales randomly sampled from [640, 800] pixels; inference is on a single scale of 800 pixels.
In this implementation, all images are in same size. I will try to add data augmentation to do this. like this:
augmentation = iaa.SomeOf((0, 3), [
iaa.Fliplr(0.5),
iaa.Flipud(0.5),
iaa.OneOf([iaa.Affine(rotate=90),
iaa.Affine(rotate=180),
iaa.Affine(rotate=270)],
),
iaa.Affine(scale={"x": (0.8, 1.2), "y": (0.8, 1.2)}),
iaa.Multiply((0.8, 1.5)),
iaa.GaussianBlur(sigma=(0.0, 5.0))
])
Thank you.
@shikunyu8 How can I reduce number of images trained/(batch_size) per epoch ?
Hi @shikunyu8 I'm curious if you found any strategy to be particularly effective now that you've had some time to experiment?
@patrickcgray Hi, I did a lot of experiments and get roughly 85% recall. It's not great, but I guess that is close to the optimal parameter settings for my dataset. Several strategies worth noting:
Innit with coco and early cutoff. I observed a huge increase in recall when using coco pre-trained weight, so absolutely init with it. Since my dataset is small, training too many iterations will overfit on training set which will make the model not generalizable, and the validation loss increases after certain epoch. In my experiment, I picked the weight with optimal performance.
Select proper anchor scales. This affects model performance dramatically. You can use inspect_data.ipyn to check your data or just try several levels of scales.
Tune WEIGHT_DECAY. This affects L2 strength and is a good way of preventing overfitting. Please try 0.01, 0.005 and 0.001 first, then try more precisely.
Resnet 50 is worse. Someone would argue that Resnet 50 is less complex than Resnet 101, therefore worth trying when faced with an overfitting problem. But in my experiment, its performance is much worse than Resnet 101's. (I innit Resnet 50 with ImageNet weights not coco, which can be the true reason for worse performance. But I can't verify it, since I don't have Resnet 50 weights trained on coco. )
Only train classifiers can be a good idea. If your dataset size is limited, you can just train the classifiers, instead of changing the features in pre-trained weights. Or you can freeze some layers and train part of layers in Resnet like resnet4+. One example: head(20 epochs)---resnet4+(40 epochs)/10---all(60 epochs)(/10) . /10 means the learning rate is divided by 10, this helps the algorithm to converge easier.
Hope this helps.
Hi @shikunyu8 thanks so much for all the info! I've altered the anchor scales and you're right the inspect_data.ipynb helps a lot. I'm now training training with different weight_decay values and will report back how they help! I've also changed the loss weights to be 1,2,2,2,5 to prioritize the MRCNN Mask and then lowered train_rois_per_image to 32 because I don't have many objects per image. Very curious how this will impact loss. It is so difficult to only test one variable at a time because it takes me ~36 hours to do a full training run. Your step 5 has also been very helpful.
Again, thanks for the insight!
@patrick-12sigma I did that statistical analysis and also tried all possible anchor scales, but the AP is not as good as expected(best is about 0.75). I think maybe it is because of over-fitting(I got 0.95 on test set). In mask-rcnn paper, they said: To reduce overfitting, as this training set is smaller, we train using image scales randomly sampled from [640, 800] pixels; inference is on a single scale of 800 pixels.
In this implementation, all images are in same size. I will try to add data augmentation to do this. like this:
augmentation = iaa.SomeOf((0, 3), [ iaa.Fliplr(0.5), iaa.Flipud(0.5), iaa.OneOf([iaa.Affine(rotate=90), iaa.Affine(rotate=180), iaa.Affine(rotate=270)], ), iaa.Affine(scale={"x": (0.8, 1.2), "y": (0.8, 1.2)}), iaa.Multiply((0.8, 1.5)), iaa.GaussianBlur(sigma=(0.0, 5.0)) ])
Thank you.
How many times do you augment the data? I mean, I do not get how many images are created by using this code.
Hi @javierfs I did some more comprehensive augmentation, and I'm only getting slightly worse on my training set than my validation set, with a total of 265 training images, my code was:
augmentation = iaa.Sometimes(.667, iaa.Sequential([
iaa.Fliplr(0.5), # horizontal flips
iaa.Crop(percent=(0, 0.1)), # random crops
# Small gaussian blur with random sigma between 0 and 0.25.
# But we only blur about 50% of all images.
iaa.Sometimes(0.5,
iaa.GaussianBlur(sigma=(0, 0.25))
),
# Strengthen or weaken the contrast in each image.
iaa.ContrastNormalization((0.75, 1.5)),
# Add gaussian noise.
# For 50% of all images, we sample the noise once per pixel.
# For the other 50% of all images, we sample the noise per pixel AND
# channel. This can change the color (not only brightness) of the
# pixels.
iaa.AdditiveGaussianNoise(loc=0, scale=(0.0, 0.05*255)),
# Make some images brighter and some darker.
# In 20% of all cases, we sample the multiplier once per channel,
# which can end up changing the color of the images.
iaa.Multiply((0.8, 1.2)),
# Apply affine transformations to each image.
# Scale/zoom them, translate/move them, rotate them and shear them.
iaa.Affine(
scale={"x": (0.8, 1.2), "y": (0.8, 1.2)},
#translate_percent={"x": (-0.2, 0.2), "y": (-0.2, 0.2)},
rotate=(-180, 180),
#shear=(-8, 8)
)
], random_order=True)) # apply augmenters in random order
hello, i want to know how to only train classifiers? what's the command line?
hello, i want to know how to only train classifiers? what's the command line? @shikunyu8 @patrickcgray
@190665688 Look at the color splash example. You can just train the top level of the network and use the coco pre-trained weights.
model.train(dataset_train, dataset_val, learning_rate=config.LEARNING_RATE, epochs=20, augmentation = augmentation, layers='heads')
notice the layers=heads
Hi @shikunyu8 thanks so much for all the info! I've altered the anchor scales and you're right the inspect_data.ipynb helps a lot. I'm now training training with different weight_decay values and will report back how they help! I've also changed the loss weights to be 1,2,2,2,5 to prioritize the MRCNN Mask and then lowered train_rois_per_image to 32 because I don't have many objects per image. Very curious how this will impact loss. It is so difficult to only test one variable at a time because it takes me ~36 hours to do a full training run. Your step 5 has also been very helpful.
Again, thanks for the insight!
Hello, I am very interested in what you said inspect_data.ipynb, how do you set anchor scales through this file? Can you describe how to operate it carefully? Thank you!
@ChauncyFr did you find the solution?
Dear @shikunyu8 Thanks for your comments on how to improve the network. I don't understand how to do your step 5? Would you please explain how should I change the code to be able to train for certain epochs on certain levels and with certain learning rate! Or maybe @patrickcgray can also help as I see he used your 5th step and said it was helpful for him.
Hi @shikunyu8 thanks so much for all the info! I've altered the anchor scales and you're right the inspect_data.ipynb helps a lot. I'm now training training with different weight_decay values and will report back how they help! I've also changed the loss weights to be 1,2,2,2,5 to prioritize the MRCNN Mask and then lowered train_rois_per_image to 32 because I don't have many objects per image. Very curious how this will impact loss. It is so difficult to only test one variable at a time because it takes me ~36 hours to do a full training run. Your step 5 has also been very helpful. Again, thanks for the insight!
Hello, I am very interested in what you said inspect_data.ipynb, how do you set anchor scales through this file? Can you describe how to operate it carefully? Thank you!
you can inspect with that file how is the size of your boundary box objects and then based on that decide to how to choose your scales. You must change the scales in your main code that run for training (look at the config file the for the default scales).
Dear @shikunyu8 Thanks for your comments on how to improve the network. I don't understand how to do your step 5? Would you please explain how should I change the code to be able to train for certain epochs on certain levels and with certain learning rate! Or maybe @patrickcgray can also help as I see he used your 5th step and said it was helpful for him.
I found the answer in issue #168. I put it here incase can help others. You can train in different stages using this example: `
print("Train network heads")
model.train(dataset_train, dataset_val,
learning_rate=config.LEARNING_RATE,
epochs=40,
augmentation=augmentation,
layers='heads')
# Finetune layers from ResNet stage 4 and up
print("Fine tune Resnet stage 4 and up")
model.train(dataset_train, dataset_val,
learning_rate=config.LEARNING_RATE,
epochs=120,
layers='4+')
print("Train all layers")
model.train(dataset_train, dataset_val,
learning_rate=config.LEARNING_RATE/10,
epochs=300,
augmentation=augmentation,
layers='all')`
@banafsh89 what is your batch size? and other configuration?
@banafsh89 what is your batch size? and other configuration?
I don't have a desired result yet! my accuracy is 83% but I am aiming for 95%. My batch size is 15, backbone resnet101, and only 900 images for training+val. The rest of my configuration is the same as nucleus.py configuration in the samples.
Hi @javierfs I did some more comprehensive augmentation, and I'm only getting slightly worse on my training set than my validation set, with a total of 265 training images, my code was:
augmentation = iaa.Sometimes(.667, iaa.Sequential([ iaa.Fliplr(0.5), # horizontal flips iaa.Crop(percent=(0, 0.1)), # random crops # Small gaussian blur with random sigma between 0 and 0.25. # But we only blur about 50% of all images. iaa.Sometimes(0.5, iaa.GaussianBlur(sigma=(0, 0.25)) ), # Strengthen or weaken the contrast in each image. iaa.ContrastNormalization((0.75, 1.5)), # Add gaussian noise. # For 50% of all images, we sample the noise once per pixel. # For the other 50% of all images, we sample the noise per pixel AND # channel. This can change the color (not only brightness) of the # pixels. iaa.AdditiveGaussianNoise(loc=0, scale=(0.0, 0.05*255)), # Make some images brighter and some darker. # In 20% of all cases, we sample the multiplier once per channel, # which can end up changing the color of the images. iaa.Multiply((0.8, 1.2)), # Apply affine transformations to each image. # Scale/zoom them, translate/move them, rotate them and shear them. iaa.Affine( scale={"x": (0.8, 1.2), "y": (0.8, 1.2)}, #translate_percent={"x": (-0.2, 0.2), "y": (-0.2, 0.2)}, rotate=(-180, 180), #shear=(-8, 8) ) ], random_order=True)) # apply augmenters in random order
For augmentation, will the augmentation just apply to images or both images and mask annotations? For example, for iaa.Fliplr, I just flip the images, but how about the mask annotations which is saved in the json file?
Hi @javierfs I did some more comprehensive augmentation, and I'm only getting slightly worse on my training set than my validation set, with a total of 265 training images, my code was:
augmentation = iaa.Sometimes(.667, iaa.Sequential([ iaa.Fliplr(0.5), # horizontal flips iaa.Crop(percent=(0, 0.1)), # random crops # Small gaussian blur with random sigma between 0 and 0.25. # But we only blur about 50% of all images. iaa.Sometimes(0.5, iaa.GaussianBlur(sigma=(0, 0.25)) ), # Strengthen or weaken the contrast in each image. iaa.ContrastNormalization((0.75, 1.5)), # Add gaussian noise. # For 50% of all images, we sample the noise once per pixel. # For the other 50% of all images, we sample the noise per pixel AND # channel. This can change the color (not only brightness) of the # pixels. iaa.AdditiveGaussianNoise(loc=0, scale=(0.0, 0.05*255)), # Make some images brighter and some darker. # In 20% of all cases, we sample the multiplier once per channel, # which can end up changing the color of the images. iaa.Multiply((0.8, 1.2)), # Apply affine transformations to each image. # Scale/zoom them, translate/move them, rotate them and shear them. iaa.Affine( scale={"x": (0.8, 1.2), "y": (0.8, 1.2)}, #translate_percent={"x": (-0.2, 0.2), "y": (-0.2, 0.2)}, rotate=(-180, 180), #shear=(-8, 8) ) ], random_order=True)) # apply augmenters in random order
For augmentation, will the augmentation just apply to images or both images and mask annotations? For example, for iaa.Fliplr, I just flip the images, but how about the mask annotations which is saved in the json file?
It will be done automatically on mask too. No worries
Hi @javierfs I did some more comprehensive augmentation, and I'm only getting slightly worse on my training set than my validation set, with a total of 265 training images, my code was:
augmentation = iaa.Sometimes(.667, iaa.Sequential([ iaa.Fliplr(0.5), # horizontal flips iaa.Crop(percent=(0, 0.1)), # random crops # Small gaussian blur with random sigma between 0 and 0.25. # But we only blur about 50% of all images. iaa.Sometimes(0.5, iaa.GaussianBlur(sigma=(0, 0.25)) ), # Strengthen or weaken the contrast in each image. iaa.ContrastNormalization((0.75, 1.5)), # Add gaussian noise. # For 50% of all images, we sample the noise once per pixel. # For the other 50% of all images, we sample the noise per pixel AND # channel. This can change the color (not only brightness) of the # pixels. iaa.AdditiveGaussianNoise(loc=0, scale=(0.0, 0.05*255)), # Make some images brighter and some darker. # In 20% of all cases, we sample the multiplier once per channel, # which can end up changing the color of the images. iaa.Multiply((0.8, 1.2)), # Apply affine transformations to each image. # Scale/zoom them, translate/move them, rotate them and shear them. iaa.Affine( scale={"x": (0.8, 1.2), "y": (0.8, 1.2)}, #translate_percent={"x": (-0.2, 0.2), "y": (-0.2, 0.2)}, rotate=(-180, 180), #shear=(-8, 8) ) ], random_order=True)) # apply augmenters in random order
For augmentation, will the augmentation just apply to images or both images and mask annotations? For example, for iaa.Fliplr, I just flip the images, but how about the mask annotations which is saved in the json file?
It will be done automatically on mask too. No worries
Hi @banafsh89 , thanks for your reply. Another question is that do I need to change my batch size (STEPS_PER_EPOCH) after I did the augmentation. For example, my current augmentation code is:
augmentation = iaa.OneOf([ iaa.Fliplr(0.5), iaa.Flipud(0.5), iaa.Affine(rotate=90), iaa.GaussianBlur(sigma=(0.0, 5.0)) ])
For my understanding, the augmentation will generate a augmented image (random choose fliplr, flipud, affine and GaussianBlur) for each training image, so if there are 200 original training images then the total training images are 200 original + 200 augmented = 400.
So do I need to double my STEPS_PER_EPOCH?
STEPS_PER_EPOCH
No you don't need to change it. Set it based on your training dataset.
Apologies, very new to all this stuff. Just experimenting through, where should I put in the Augmentation code? I mean in model.py or config.py
Apologies, very new to all this stuff. Just experimenting through, where should I put in the Augmentation code? I mean in model.py or config.py
You need to set it in your training script, like this :
model.train(dataset_train, dataset_val,
learning_rate=config.LEARNING_RATE,
epochs=int(n_epochs),
layers=layers,
augmentation = imgaug.augmenters.Sequential([
imgaug.augmenters.Affine(rotate=(-45, 45))]),
class_weight = class_weights
)
@patrickcgray Hi, I did a lot of experiments and get roughly 85% recall. It's not great, but I guess that is close to the optimal parameter settings for my dataset. Several strategies worth noting:
- Innit with coco and early cutoff. I observed huge increase of recall when using coco pre-trained weight, so absolutely init with it. Since my dataset is small, training too many iterations will overfit on training set which will make the model not generalizable, and the validation loss increases after certain epoch. In my experiment, I picked the weight with optimal performance.
- Select proper anchor scales. This affect model performance dramatically. You can use inspect_data.ipyn to check your data or just try several levels of scales.
- Tune WEIGHT_DECAY. This affect L2 strength and is a good way of preventing overfitting. Please try 0.01, 0.005 and 0.001 first, then try more precisely.
- Resnet 50 is worse. Someone would argue that Resnet 50 is less complex than Resnet 101, therefore worth trying when faced with overfitting problem. But in my experiment, its performance is much worse than Resnet 101's. (I innit Resnet 50 with ImageNet weights not coco, which can be the true reason for worse performance. But I can't verify it, since I don't have Resnet 50 weights trained on coco. )
- Only train classifiers can be a good idea. If your dataset size is limited, you can just train the classifiers, instead of changing the features in pre-trained weights. Or you can freeze some layers and train part of layers in Resnet like resnet4+. One example: head(20 epochs)---resnet4+(40 epochs)/10---all(60 epochs)(/10) . /10 means learning rate is divided by 10, this helps algorithm to converge easier.
Hope this help.
Kunyu Shi
Hey, thank you so much for this generous training strategy. Actually, I'm gonna for sure use this strategy, but I need some scientific explanation, here are some questions : 1- Why do we need to do 3 different training, the first one on head, seconde one on +4 resnet layers and the last one on all layers. 2- You are saying that I need to save the weights from the frist training and use them for the seconde training and so on ? 3- How can we choose the right data augmentation method for our data ? 4- When you said that training too many iterations will overfit on training data, do you mean by "iterations" steps_per_epoch ? If this is true, so we cant use steps_per_epoch = training set size //batch_size as a general definition ?
Please I need answers for these questions, thank you in advance.
could anyone please tell me, what does TRAIN_ROIS_PER_IMAGE does?
@kazzastic I believe it is the number of region of interests that the RPN proposes for every image. If you have a lot of objects in your images, you should keep it high. You can check out the config.py in model folder.
@kazzastic I believe it is the number of region of interests that the RPN proposes for every image. If you have a lot of objects in your images, you should keep it high. You can check out the config.py in model folder.
But I thought it was the job of RPN_NMS_THRESHOLD to increase or decrease the number of proposals generated during training.
Or is it that RPN_NMS_THRESHOLD generates and then the TRAIN_ROI_PER_IMAGE decides how many have to be fed to the mask head?
@kazzastic I believe it is the number of region of interests that the RPN proposes for every image. If you have a lot of objects in your images, you should keep it high. You can check out the config.py in model folder.
But I thought it was the job of RPN_NMS_THRESHOLD to increase or decrease the number of proposals generated during training.
Or is it that RPN_NMS_THRESHOLD generates and then the TRAIN_ROI_PER_IMAGE decides how many have to be fed to the mask head?
yes you are right, TRAIN_ROI_PER_IMAGE is how many ROI proposals you will feed to the mask head. RPN_NMS_THRESHOLD determines which proposals you keep during RPN training based on non-max suppression, so you can increase RPN_NMS_THRESHOLD to increase number of proposals.
@kazzastic I believe it is the number of region of interests that the RPN proposes for every image. If you have a lot of objects in your images, you should keep it high. You can check out the config.py in model folder.
But I thought it was the job of RPN_NMS_THRESHOLD to increase or decrease the number of proposals generated during training.
Or is it that RPN_NMS_THRESHOLD generates and then the TRAIN_ROI_PER_IMAGE decides how many have to be fed to the mask head?
yes you are right, TRAIN_ROI_PER_IMAGE is how many ROI proposals you will feed to the mask head. RPN_NMS_THRESHOLD determines which proposals you keep during RPN training based on non-max suppression, so you can increase RPN_NMS_THRESHOLD to increase number of proposals.
So what do you think is it always good to have a large value of TRAIN_ROI_PER_IMAGE ?
I had tried reducing it to 100 before but it worsened the loss. Training takes a bit longer but i would rather keep the value high.
I tested four scenarios:( learning rate decreased by 10 after 20 epochs) orange: init with random weights, no augmentation, 20 epochs for heads and 40 for all layers. red: init with coco, with augmentation, 20 epochs for heads and 40 for all layers. dark blue: init with coco, no augmentation, 20 epochs for heads and 40 for all layers. light blue: init with coco, with augmentation, only train heads layer(should be only training classifier) for 60 epochs,
no situation shows val_loss lower than 1, any suggestions?
Hello, How did you plot the graphs in maskrcnn. Which code did you use to get the values for val_loss and training loss to draw the graphs.
Dear @shikunyu8 Thanks for your comments on how to improve the network. I don't understand how to do your step 5? Would you please explain how should I change the code to be able to train for certain epochs on certain levels and with certain learning rate! Or maybe @patrickcgray can also help as I see he used your 5th step and said it was helpful for him.
I found the answer in issue #168. I put it here incase can help others. You can train in different stages using this example: `
print("Train network heads") model.train(dataset_train, dataset_val, learning_rate=config.LEARNING_RATE, epochs=40, augmentation=augmentation, layers='heads') # Finetune layers from ResNet stage 4 and up print("Fine tune Resnet stage 4 and up") model.train(dataset_train, dataset_val, learning_rate=config.LEARNING_RATE, epochs=120, layers='4+') print("Train all layers") model.train(dataset_train, dataset_val, learning_rate=config.LEARNING_RATE/10, epochs=300, augmentation=augmentation, layers='all')`
I've tried implementing this but whenever it finishes running the first stage of training, it reloads the configuration and then exits and stops. Did anyone have the same problem?
Apologies, very new to all this stuff. Just experimenting through, where should I put in the Augmentation code? I mean in model.py or config.py
You need to set it in your training script, like this :
model.train(dataset_train, dataset_val, learning_rate=config.LEARNING_RATE, epochs=int(n_epochs), layers=layers, augmentation = imgaug.augmenters.Sequential([ imgaug.augmenters.Affine(rotate=(-45, 45))]), class_weight = class_weights )
@Altimis How do we assign class weights? The sample size for classes is imbalanced in my case and would like to set class weights.
@DigitalPlantScience
Here is an example :
CLASS_WEIGHTS = {
0:2500,
1:200,
2:4500
}
def compute_class_weights(CLASS_WEIGHTS=CLASS_WEIGHTS):
"""
retuen weighted classes
"""
mean = np.array(list(CLASS_WEIGHTS.values())).mean() # sum_class_occurence / nb_classes
max_weight = max(CLASS_WEIGHTS.values())
CLASS_WEIGHTS.update((x, float(max_weight/(y))) for x, y in CLASS_WEIGHTS.items())
CLASS_WEIGHTS=dict(sorted(CLASS_WEIGHTS.items()))
return CLASS_WEIGHTS
class_weights = compute_class_weights(CLASS_WEIGHTS)
model.train(dataset_train, dataset_val,
learning_rate=config.LEARNING_RATE,
epochs=int(n_epochs),
layers=layers,
augmentation = imgaug.augmenters.Sequential([
imgaug.augmenters.Affine(rotate=(-45, 45))]),
class_weight = class_weights
)
@Altimis Thanks for the technique to compute class weights. I didn't know the model.train has a "class_weight" argument to pass on the value for. Just to make sure again, do we need to add/modify codes in model.py in order to make use of "class_weight" argument or is it available already for use?
@kazzastic I believe it is the number of region of interests that the RPN proposes for every image. If you have a lot of objects in your images, you should keep it high. You can check out the config.py in model folder.
But I thought it was the job of RPN_NMS_THRESHOLD to increase or decrease the number of proposals generated during training. Or is it that RPN_NMS_THRESHOLD generates and then the TRAIN_ROI_PER_IMAGE decides how many have to be fed to the mask head?
yes you are right, TRAIN_ROI_PER_IMAGE is how many ROI proposals you will feed to the mask head. RPN_NMS_THRESHOLD determines which proposals you keep during RPN training based on non-max suppression, so you can increase RPN_NMS_THRESHOLD to increase number of proposals.
Will it help to increase RPN_NMS_THRESHOLD if i have a lot of instances of an object in a single image? What other parameters should I change to train better?
class_weight
@Altimis I tried adding class_weight argument in model.train but it throws following error. Any idea?
TypeError Traceback (most recent call last)
Dear @shikunyu8 Thanks for your comments on how to improve the network. I don't understand how to do your step 5? Would you please explain how should I change the code to be able to train for certain epochs on certain levels and with certain learning rate! Or maybe @patrickcgray can also help as I see he used your 5th step and said it was helpful for him.
I found the answer in issue #168. I put it here incase can help others. You can train in different stages using this example: `
print("Train network heads") model.train(dataset_train, dataset_val, learning_rate=config.LEARNING_RATE, epochs=40, augmentation=augmentation, layers='heads') # Finetune layers from ResNet stage 4 and up print("Fine tune Resnet stage 4 and up") model.train(dataset_train, dataset_val, learning_rate=config.LEARNING_RATE, epochs=120, layers='4+') print("Train all layers") model.train(dataset_train, dataset_val, learning_rate=config.LEARNING_RATE/10, epochs=300, augmentation=augmentation, layers='all')`
I've tried implementing this but whenever it finishes running the first stage of training, it reloads the configuration and then exits and stops. Did anyone have the same problem?
Yea. I get exactly the same problem. I didn't understand how @shikunyu8 did this. Can anybody explain it?
@patrickcgray Hi, I did a lot of experiments and get roughly 85% recall. It's not great, but I guess that is close to the optimal parameter settings for my dataset. Several strategies worth noting:
- Innit with coco and early cutoff. I observed huge increase of recall when using coco pre-trained weight, so absolutely init with it. Since my dataset is small, training too many iterations will overfit on training set which will make the model not generalizable, and the validation loss increases after certain epoch. In my experiment, I picked the weight with optimal performance.
- Select proper anchor scales. This affect model performance dramatically. You can use inspect_data.ipyn to check your data or just try several levels of scales.
- Tune WEIGHT_DECAY. This affect L2 strength and is a good way of preventing overfitting. Please try 0.01, 0.005 and 0.001 first, then try more precisely.
- Resnet 50 is worse. Someone would argue that Resnet 50 is less complex than Resnet 101, therefore worth trying when faced with overfitting problem. But in my experiment, its performance is much worse than Resnet 101's. (I innit Resnet 50 with ImageNet weights not coco, which can be the true reason for worse performance. But I can't verify it, since I don't have Resnet 50 weights trained on coco. )
- Only train classifiers can be a good idea. If your dataset size is limited, you can just train the classifiers, instead of changing the features in pre-trained weights. Or you can freeze some layers and train part of layers in Resnet like resnet4+. One example: head(20 epochs)---resnet4+(40 epochs)/10---all(60 epochs)(/10) . /10 means learning rate is divided by 10, this helps algorithm to converge easier.
Hope this help.
Kunyu Shi
@shikunyu8 I tried to implement no. 5, i.e. an lr decay at every stage, however the loss seems to be going up rather than down at the start of every stage. For example, if the first stage ends at 1.6, the second starts with 3.4, and then decreases from then onwards.
Dear @shikunyu8 Thanks for your comments on how to improve the network. I don't understand how to do your step 5? Would you please explain how should I change the code to be able to train for certain epochs on certain levels and with certain learning rate! Or maybe @patrickcgray can also help as I see he used your 5th step and said it was helpful for him.
I found the answer in issue #168. I put it here incase can help others. You can train in different stages using this example: `
print("Train network heads") model.train(dataset_train, dataset_val, learning_rate=config.LEARNING_RATE, epochs=40, augmentation=augmentation, layers='heads') # Finetune layers from ResNet stage 4 and up print("Fine tune Resnet stage 4 and up") model.train(dataset_train, dataset_val, learning_rate=config.LEARNING_RATE, epochs=120, layers='4+') print("Train all layers") model.train(dataset_train, dataset_val, learning_rate=config.LEARNING_RATE/10, epochs=300, augmentation=augmentation, layers='all')`
I've tried implementing this but whenever it finishes running the first stage of training, it reloads the configuration and then exits and stops. Did anyone have the same problem?
Yea. I get exactly the same problem. I didn't understand how @shikunyu8 did this. Can anybody explain it?
@ nielsen1994...I had faced this problem when I was adding the no.of epochs wrong. For example, if I wanted to train the first 20 epochs with heads, and the next 20 with 4+, the first training stage should have epochs=20, and the second epochs =40 (not 20). It has to be higher than the previous stage. Perhaps that is what is happening?
@patrickcgray Hi, I did a lot of experiments and get roughly 85% recall. It's not great, but I guess that is close to the optimal parameter settings for my dataset. Several strategies worth noting:
- Innit with coco and early cutoff. I observed huge increase of recall when using coco pre-trained weight, so absolutely init with it. Since my dataset is small, training too many iterations will overfit on training set which will make the model not generalizable, and the validation loss increases after certain epoch. In my experiment, I picked the weight with optimal performance.
- Select proper anchor scales. This affect model performance dramatically. You can use inspect_data.ipyn to check your data or just try several levels of scales.
- Tune WEIGHT_DECAY. This affect L2 strength and is a good way of preventing overfitting. Please try 0.01, 0.005 and 0.001 first, then try more precisely.
- Resnet 50 is worse. Someone would argue that Resnet 50 is less complex than Resnet 101, therefore worth trying when faced with overfitting problem. But in my experiment, its performance is much worse than Resnet 101's. (I innit Resnet 50 with ImageNet weights not coco, which can be the true reason for worse performance. But I can't verify it, since I don't have Resnet 50 weights trained on coco. )
- Only train classifiers can be a good idea. If your dataset size is limited, you can just train the classifiers, instead of changing the features in pre-trained weights. Or you can freeze some layers and train part of layers in Resnet like resnet4+. One example: head(20 epochs)---resnet4+(40 epochs)/10---all(60 epochs)(/10) . /10 means learning rate is divided by 10, this helps algorithm to converge easier.
Hope this help. Kunyu Shi
@shikunyu8 I tried to implement no. 5, i.e. an lr decay at every stage, however the loss seems to be going up rather than down at the start of every stage. For example, if the first stage ends at 1.6, the second starts with 3.4, and then decreases from then onwards.
@sohinimallick Yes this is common and happened to me to. I am not 100% sure but that probably happens because at each stage more and more of the network participates in the training process (1st step only the FPN & heads, 2nd stage 4+ resnet, 3rd stage full 101 layers) and that causes the loss to cumulate at the start of the training. The new portion of the network that participates in each successive stage in not trained and needs to be fine-tuned. The rise in error probably arises from there. I am not sure about this. Do tell me if you find a better explanation.
Training strategy Hi ,I am facing issue regarding training that i have only 1100 images with 9 classes , I train all layers with coco weights. but the accuracy is not good as that i want. i think it happens bacause of less data. i try augumentation while training but again i got same results
@patrickcgray Hi, I did a lot of experiments and get roughly 85% recall. It's not great, but I guess that is close to the optimal parameter settings for my dataset. Several strategies worth noting:
- Innit with coco and early cutoff. I observed huge increase of recall when using coco pre-trained weight, so absolutely init with it. Since my dataset is small, training too many iterations will overfit on training set which will make the model not generalizable, and the validation loss increases after certain epoch. In my experiment, I picked the weight with optimal performance.
- Select proper anchor scales. This affect model performance dramatically. You can use inspect_data.ipyn to check your data or just try several levels of scales.
- Tune WEIGHT_DECAY. This affect L2 strength and is a good way of preventing overfitting. Please try 0.01, 0.005 and 0.001 first, then try more precisely.
- Resnet 50 is worse. Someone would argue that Resnet 50 is less complex than Resnet 101, therefore worth trying when faced with overfitting problem. But in my experiment, its performance is much worse than Resnet 101's. (I innit Resnet 50 with ImageNet weights not coco, which can be the true reason for worse performance. But I can't verify it, since I don't have Resnet 50 weights trained on coco. )
- Only train classifiers can be a good idea. If your dataset size is limited, you can just train the classifiers, instead of changing the features in pre-trained weights. Or you can freeze some layers and train part of layers in Resnet like resnet4+. One example: head(20 epochs)---resnet4+(40 epochs)/10---all(60 epochs)(/10) . /10 means learning rate is divided by 10, this helps algorithm to converge easier.
Hope this help. Kunyu Shi
@shikunyu8 I tried to implement no. 5, i.e. an lr decay at every stage, however the loss seems to be going up rather than down at the start of every stage. For example, if the first stage ends at 1.6, the second starts with 3.4, and then decreases from then onwards.
@sohinimallick Yes this is common and happened to me to. I am not 100% sure but that probably happens because at each stage more and more of the network participates in the training process (1st step only the FPN & heads, 2nd stage 4+ resnet, 3rd stage full 101 layers) and that causes the loss to cumulate at the start of the training. The new portion of the network that participates in each successive stage in not trained and needs to be fine-tuned. The rise in error probably arises from there. I am not sure about this. Do tell me if you find a better explanation.
Hello, have you found the problem?
Hi,
First, I wanna say thank you for providing this great implementation of Mask_RCNN. I am using this repo for a while but can't figure out a proper training strategy, so I hope someone can give me some suggestions.
Image: CBIS-DDSM (X-ray images) training:981, validation:250. Roughly one instance per image( highly class imbalance).
A sample image and correspoing mask:
My Configurations: BACKBONE resnet101 BATCH_SIZE 2 DETECTION_MIN_CONFIDENCE 0.7 DETECTION_NMS_THRESHOLD 0.3 IMAGE_MAX_DIM 512 IMAGE_MIN_DIM 512 IMAGE_RESIZE_MODE square IMAGE_SHAPE [512 512 3] LEARNING_MOMENTUM 0.9 LEARNING_RATE 0.001 MEAN_PIXEL [53.129 53.129 53.129] ROI_POSITIVE_RATIO 0.33 RPN_ANCHOR_RATIOS [0.5, 1, 2] RPN_ANCHOR_SCALES (16, 32, 64, 128, 256) RPN_NMS_THRESHOLD 0.7 RPN_TRAIN_ANCHORS_PER_IMAGE 512 TRAIN_ROIS_PER_IMAGE 320 WEIGHT_DECAY 0.0001
Augmentation: augmentation = iaa.SomeOf((0, 2), [ iaa.Fliplr(0.5), iaa.Flipud(0.5),])
I read many issues in this repo, and I summarized several strategies that could possibly help me.
Based on these suggestions, I did some incomplete research. But I haven't got nice predictions.