Open vladpaunescu opened 6 years ago
I noticed you use ResNet101 encoder and download ResNet50 weights. How do you apply the weights from ResNet50 to ResNet101?
Yes, this was discussed in another thread as well. I used ResNet50 weights because these were the ones available from Keras. And since we're providing COCO trained weights, I figured it's better to start with the COCO weights anyway for most cases. If I get some free time, I might try to find ResNet101 weights and use them instead. It might improve the performance a bit. If you're interested in implementing and submitting a pull request, I'd be happy to review it.
I'm trying to reproduce the results from the repository using ResNet101
I think you'll need to use a bigger batch size, hence the 8-GPU training. At some point I had the model trained on 1GPU and then switched to 8GPUs and continued training, I noticed a good improvement from that switch. Also, I think I trained it longer than the example schedule above. I don't remember the details, but that's why I put a note stating that this schedule is just an example.
If you'd rather only train on 1 GPU, an alternative is to batch the updates from every 8 steps and average them before applying them to the weights. But that would require changes to the Keras optimizer. Not too complex, but not too simple either.
Why do you first train the heads (freeze the encoder) and then fine tune the encoder?
When you start, the backbone has good weights trained on ImageNet, but the heads have random weights. If you train all layers, then you end up updating the backbone weights using gradients computed based on the random weights in heads. This will cause unnecessary changes to the backbone weights. Training the heads only, ensures that we don't touch the good backbone weights until the heads had a bit of time to settle.
Another approach to handle this situation is to do a warm up phase. Here you train all layers but with a much smaller learning rate (say /100) and then after things settle you start using your original learning rate.
Do you do any data augmentation, besides the random horizotnal flips from load_image_gt method?
No
when only the heads are training with learning rate/100, the loss seems to jump up
Hmm. Hard to guess. Based on your graphs, it looks like the training loss is still decreasing, but the validation loss goes up. That usually suggest over-fitting, but there could also be something else.
I already adapted your code for loading 101 layer resnet weights and left it training over the holidays. When I get some acceptable results, I will make a pull request. Maybe you can retrain on 8 GPUs then.
Concerning larger batch sizes on limited vram maybe we can revive the following keras issue? https://github.com/keras-team/keras/issues/5244
Hello again! I trained with ResNet101 in order to reproduce official results.
My training protocol is default as given by example:
# Training - Stage 1
print("Training network heads")
model.train(dataset_train, dataset_val,
learning_rate=config.LEARNING_RATE,
epochs=40,
layers='heads')
# Training - Stage 2
# Finetune layers from ResNet stage 4 and up
print("Fine tune Resnet stage 4 and up")
model.train(dataset_train, dataset_val,
learning_rate=config.LEARNING_RATE,
epochs=120,
layers='4+')
# Training - Stage 3
# Fine tune all layers
print("Fine tune all layers")
model.train(dataset_train, dataset_val,
learning_rate=config.LEARNING_RATE / 10,
epochs=160,
layers='all')
Other hyperparameters are default. GPU count is 1. Image/GPU is 2:
# Learning rate and momentum
# NUMBER OF GPUs to use. For CPU training, use 1
GPU_COUNT = 1
# Number of images to train with on each GPU. A 12GB GPU can typically
# handle 2 images of 1024x1024px.
# Adjust based on your GPU memory and image sizes. Use the highest
# number that your GPU can handle for best performance.
IMAGES_PER_GPU = 2
# Number of training steps per epoch
# This doesn't need to match the size of the training set. Tensorboard
# updates are saved at the end of each epoch, so setting this to a
# smaller number means getting more frequent TensorBoard updates.
# Validation stats are also calculated at each epoch end and they
# might take a while, so don't set this too small to avoid spending
# a lot of time on validation stats.
STEPS_PER_EPOCH = 1000
# The Mask RCNN paper uses lr=0.02, but on TensorFlow it causes
# weights to explode. Likely due to differences in optimzer
# implementation.
LEARNING_RATE = 0.001
LEARNING_MOMENTUM = 0.9
Unfortunately, the results are way below official release. My best checkpoint has:
Epoch 150 (epoch 151 actually since counting starts from 0):
Evaluate annotation type *bbox*
DONE (t=2.54s).
Accumulating evaluation results...
DONE (t=0.76s).
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.224
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.426
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.211
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.114
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.270
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.346
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.209
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.300
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.306
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.141
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.345
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.456
Besides that, I had a bug when evaluating Inception-ResNet-V2 model (trained with Imagnet mean subtraction, evaluated without). After the fix, best accuracy is:
Epoch 217:
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.287
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.454
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.313
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.153
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.335
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.426
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.256
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.343
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.350
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.176
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.389
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.509
Even though Inception-Resnet-V2 outperforms ResNet101, both of them are well below reported results. That might be caused by:
Please, if you have better results, or any ideas of how to improve the experiments (especially using Inception-Resnet-V2 backbone), post here.
@waleedka Thank you for your detailed explanation, and for reopening the issue of Virtual Batch Size in keras.
Vlad
@vladpaunescu Out of curiosity, are you using Python 3 or 2.7?
@vladpaunescu, your results seem quite good if you only trained on 1 GPU no ? You trained on 160 epochs 1000 steps per epoch 2 images per GPU 1 GPU = 320 000 images Official results get obtained on 160 000 steps 2 images per GPU * 8 GPUs = 2 560 000 images
Correct me if I'm wrong.
BTW, I'm really insteresting in a ResNet 101 implementation as you did.
@jmtatsch : Have you complete the training using offical resnet-101 pretrained model. Could you share us your result now?
Do you use this model link below: https://github.com/tensorflow/models/blob/master/research/slim/nets/inception_resnet_v2.py
@vladpaunescu Can you telll me how I can draw the mrcnn_bbx_loss, mrcnn_class_loss, rpn_bbx_loss plot like you....?
@vladpaunescu: I think your shape is wrong. As example of resnet50. It should be C1=C2=/2 , C3 =/4 , C4=/8 , and C5=/16. Am I right?
@matiqul go to the directory where Mask_RCNN is and type in the command
tensorboard --logdir=logs
then open a browser and write http://localhost:6006 and you should get all the plots you need from tensorboard
@AloshkaD thanks it works.....
@vladpaunescu
Hi, I am interesting your job, I also want to changed the backbone CNN structrue from resnet to inception resnet, which is just similar like you, could you introduce more details how to do that? many thanks.
Kind Regards
Wei
@vladpaunescu @enoceanwei Hi, I hope you succeed on this task. Please I'm trying to do the same task. Please can you help me with some advice.
Hi,
Thank you for making public Mask RCNN on github. It is really amazing work. I tried to replace the ResNet-101 encoder with Inception-ResNet-V2 encoder from keras. Unfortunately, I didn't get better results.
These are the endpoints I use to build the feature pyramid. They correspond to the different scales.
I'm training
InceptionResNet-V2
on coco dataset train + valvalminusminival with one GPU, one image/gpu 2000 steps / epoch. Initial learning rate is 0.006. And training strategy is first end to end and then fine tune for heads.Unfortunately, loss doesn't decrease as provided model: train loss is 0.84, and val loss 0.2. bbox results are lower. Provided model has train loss 0.7 when finetuning.
I have some questions:
I noticed you use ResNet101 encoder and download ResNet50 weights. How do you apply the weights from ResNet50 to ResNet101?
I'm trying to reproduce the results from the repository using ResNet101. I currently have:
The training strategy is default (as in given example):
Is there anything I need to add to reproduce the results? I'm downloading ImageNet ResNet50 weights for ResNet101 encoder.
load_image_gt
method?I'm pasting here the loss from tensorboard when using InceptionResnetV2. At last stage of training, when only the heads are training with learning rate/100, the loss seems to jump up instead of decreasing. Maybe it's because I used the same learning rate as previous stage. For all other learning stages, when learning rate is decreased 10 times, loss decreases.
Thank you, Vlad