How to do transfer learning with InceptionV3/ResNet50

chaohuang commented 6 years ago

According to the Keras document, there are 2 steps to do transfer learning:

Train only the newly added top layers (which were randomly initialized) by freezing all convolutional InceptionV3/Resnet50 layers.
After the top layers are well trained, we can start fine-tuning convolutional layers from InceptionV3/Resnet50 by unfreezing those layers.

That's all good with VGG nets, but due to the use of batch normalization layers, the above procedure doesn't work for InceptionV3/Resnet50, as described in issue #9214 (I don't know why the Keras document provides an example that's not working!)

@fchollet mentioned a possible workaround here:

set learning phase to 0
load model
retrieve features you want to train on
set learning phase to 1
add new layers on top
optionally load weights from initial model layers to corresponding new layers
train

But this solution (assuming it works) seems to be used to train the newly added top layers only (step 1 above), how to fine-tune the convolutional layers in InceptionV3/Resnet50 (step 2 above) is still unknown to me.

AZweifels commented 6 years ago

@chaohuang What do you expect from step 1 if you continue to train the full network afterwards?

You may omit step 1 and train the full network by unfreezing all layers:

# make all layers trainable
for layer in base_model.layers:
    layer.trainable = True
# add your head on top
x = base_model.output
x = GlobalAveragePooling2D()(x)
x = Dense(1024, activation='relu')(x)
predictions = Dense(num_classes, activation='softmax')(x)
model = Model(base_model.input, predictions)

Don't forget to compile your model!

chaohuang commented 6 years ago

@AZweifels The reason for step 1 is the same as the one in the Keras document, where the newly added top layers were trained first before training the whole network.

Although I'm not 100% sure about the rationale, I guess is that the weights in top layers are randomly initialized, while the weights in the base model (convoluional layers) are already pre-trained, so we train top layers first such that those weights are "pre-trained" as well (at least no longer random weights) before training the full network.

In other words, the network is supposed to perform better with all "pre-trained" weights as the starting point to train the whole network than a mixture of pre-trained and random weights.

rahulkulhalli commented 6 years ago

Any follow-up on this? I'd like to know the rationale behind the two-phase training as well!

I'm trying to implement transfer learning on a binary class image dataset with well over 10k images, but InceptionV3 overfits badly, while VGG-19 performs perfectly. I did the following as well:

1) Load the Inception model 2) Load the pretrained weights 3) Add bottleneck layers (Dense + BN + Activation + Dropout + Output) 4) Froze the base layers of the model 5) Trained the bottleneck layers for 5 epochs 6) 'Unfroze' the last two inception blocks 7) Re-compiled and re-trained with SGD and a small LR.

gkarampatsakis commented 6 years ago

I've been facing the same problems (issue #10214) and it has been driving me nuts. Apparently there is a fix (PR #9965) but it is not "official" because it was not merged to master. The fix resolved my problem but it is available only for Keras 2.1.6, not for 2.2.0.

trikiomar712 commented 4 years ago

I saw a code that uses InceptionV3 as a pre-trained model but I don't know exactly what I have to put in the selected_layer variable.

this is the link to the code: https://towardsdatascience.com/creating-a-movie-recommender-using-convolutional-neural-networks-be93e66464a7

is there anyone who can help me with it?

keras-team / keras

How to do transfer learning with InceptionV3/ResNet50 #10554