keras-team / keras

Deep Learning for humans
http://keras.io/
Apache License 2.0
61.88k stars 19.44k forks source link

How to do transfer learning with InceptionV3/ResNet50 #10554

Closed chaohuang closed 3 years ago

chaohuang commented 6 years ago

According to the Keras document, there are 2 steps to do transfer learning:

  1. Train only the newly added top layers (which were randomly initialized) by freezing all convolutional InceptionV3/Resnet50 layers.

  2. After the top layers are well trained, we can start fine-tuning convolutional layers from InceptionV3/Resnet50 by unfreezing those layers.

That's all good with VGG nets, but due to the use of batch normalization layers, the above procedure doesn't work for InceptionV3/Resnet50, as described in issue #9214 (I don't know why the Keras document provides an example that's not working!)

@fchollet mentioned a possible workaround here:

But this solution (assuming it works) seems to be used to train the newly added top layers only (step 1 above), how to fine-tune the convolutional layers in InceptionV3/Resnet50 (step 2 above) is still unknown to me.

AZweifels commented 6 years ago

@chaohuang What do you expect from step 1 if you continue to train the full network afterwards?

You may omit step 1 and train the full network by unfreezing all layers:

# make all layers trainable
for layer in base_model.layers:
    layer.trainable = True
# add your head on top
x = base_model.output
x = GlobalAveragePooling2D()(x)
x = Dense(1024, activation='relu')(x)
predictions = Dense(num_classes, activation='softmax')(x)
model = Model(base_model.input, predictions)

Don't forget to compile your model!

chaohuang commented 6 years ago

@AZweifels The reason for step 1 is the same as the one in the Keras document, where the newly added top layers were trained first before training the whole network.

Although I'm not 100% sure about the rationale, I guess is that the weights in top layers are randomly initialized, while the weights in the base model (convoluional layers) are already pre-trained, so we train top layers first such that those weights are "pre-trained" as well (at least no longer random weights) before training the full network.

In other words, the network is supposed to perform better with all "pre-trained" weights as the starting point to train the whole network than a mixture of pre-trained and random weights.

rahulkulhalli commented 6 years ago

Any follow-up on this? I'd like to know the rationale behind the two-phase training as well!

I'm trying to implement transfer learning on a binary class image dataset with well over 10k images, but InceptionV3 overfits badly, while VGG-19 performs perfectly. I did the following as well:

1) Load the Inception model 2) Load the pretrained weights 3) Add bottleneck layers (Dense + BN + Activation + Dropout + Output) 4) Froze the base layers of the model 5) Trained the bottleneck layers for 5 epochs 6) 'Unfroze' the last two inception blocks 7) Re-compiled and re-trained with SGD and a small LR.

gkarampatsakis commented 6 years ago

I've been facing the same problems (issue #10214) and it has been driving me nuts. Apparently there is a fix (PR #9965) but it is not "official" because it was not merged to master. The fix resolved my problem but it is available only for Keras 2.1.6, not for 2.2.0.

trikiomar712 commented 4 years ago

I saw a code that uses InceptionV3 as a pre-trained model but I don't know exactly what I have to put in the selected_layer variable.

this is the link to the code: https://towardsdatascience.com/creating-a-movie-recommender-using-convolutional-neural-networks-be93e66464a7

is there anyone who can help me with it?