Finetune using Keras - Githubissues

manmax31 commented 8 years ago

I have got the Keras' VGG 16 model from https://gist.github.com/baraldilorenzo/07d7802847aaad0a35d3. For fine tuning in caffe, we change the final layer's output to desired the number of classes and relearn the weights of the final layer. How do I do the same in Keras? For e.g: Instead of 1000 classes, I want now 47 classes.

lemuriandezapada commented 8 years ago

imo if it's a sequential model the layers and params are just in a list. Simply pop() the last activation function and layer and sets of params, add the new one on top and recompile. Worked for me.

manmax31 commented 8 years ago

@lemuriandezapada how do you pop the last layer?

lemuriandezapada commented 8 years ago

You need to figure out what to pop, weights, activations and coresponding sets of params. But your model will have a model.layers and a model.params member. Then you can just model.layers.pop() and model.params.pop() a couple of times to remove your last entries. Add the new ones, recompile and you're set.

lemuriandezapada commented 8 years ago

You can also just make a new model with the same structure up to the last layer, and set the same layers weights from your other model with the set_weights(), get_weights() functions.

manmax31 commented 8 years ago

Following this link and your advice, I did the following while keeping everything else the same

...
model.add(Dense(4096, 1000, activation='softmax'))
if weights_path:
    model.load_weights(weights_path)

model.layers.pop()
model.params.pop()
model.add(Dense(4096, 47, activation='softmax'))
return model

I get the following error:

theano.gradient.DisconnectedInputError: grad method was asked to compute the gradient with respect to a variable that is not part of the computational graph of the cost, or is used only by a non-differentiable operator: <CudaNdarrayType(float32, matrix)>

What am I doing wrong?

lemuriandezapada commented 8 years ago

if you list model.layers you'll see the last ones are one Dense and one Activation. Also the params has one matrix and one vector.

So 2 pops for each. Also I think the api changed and now you only pass layers the output size and they infer the input size by themselves.

dbonadiman commented 8 years ago

If i'm not wrong it is not good to relearn the whole network, you should figure out a way to fix all the layers but the last one. Otherwise you will propagate the error due the random initialisation of the network to the other layers. There was a pull request for a flag to freeze the layers some days ago otherwise you need to remove the params from all the layers but the last one, something like that:

for i in range(len(model.param)-1):
        model.layers[i].params = []

My implementation is ugly and probably it doesn't work out of the box, i never tried it directly but i think you got the point.

lemuriandezapada commented 8 years ago

If you actually want to be thorough you will can just set the weight matrix for the output to oldmatrix[:,item1,item2,item3....item47] so you keep the weights for the classes you already learned. Or make one with 48 outputs and the 48th averages all the other classes you no longer care about.

Of course bdonadiman is right, this would relearn the entire network. Instead you can also just pop the last layers, recompile, compute the output of the next to last layer (or alternately make a theano function that just extracts it) and use that one as input to your new 0hiddenlayer classifier.

fchollet commented 8 years ago

If i'm not wrong it is not good to relearn the whole network, you should figure out a way to fix all the layers but the last one.

Depends. If you have enough data, you'll generally get better results by retraining the entire network, with a very low learning rate. You want to move to the state of the parameter space (all network weights) that's optimal for your new dataset, under the assumption that you are already pretty close to that state, and therefore that you should be moving very slowly, at a small spatial scale.

lireagan commented 8 years ago

@fchollet Can I use keras to set the learning rate of the last layer to higher than for other layers ?

jerpint commented 8 years ago

Hi , I need to do what you are describing but I am not sure how. I am using the VGG-16 pretrained net for KERAS

I would like to go from outputting 1000 labels to 15, but would like to use the pretrained weights of the given model. What would be the best solution to do so? I am a beginner, so the more details the better :)

Essentially, I have this :

...

model.add(Dropout(0.5))

model.add(Dense(1000, activation='softmax'))

model.load_weights('vgg16_weights.h5')

but need to figure out how to change the last layer from output 1000, to 15. Obviously, just replacing 1000 by 15 it will generate an error

Thanks,

J

grahamannett commented 8 years ago

@jerpint load the entire model with weights, pop last 2 layers, add dropout + Dense(15) and compile.

joelthchao commented 8 years ago

@jerpint @grahamannett In my case, I modify function load_weights in engine/topology.py to prevent error when trying to load weights into non-exist layers (your last layer) with try and except. Then you can directly modify network as

model.add(Dense(4096, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(15, activation='softmax'))
model.load_weights('vgg16_weights.h5')

Advantage is that you don't need to waste memory loading weights and remove them.

grahamannett commented 8 years ago

@joelthchao hmm are you referring to this? https://github.com/fchollet/keras/blob/master/keras/engine/topology.py#L2280

seem's like you will still be loading the weights but it's probably a more elegant solution.

joelthchao commented 8 years ago

@grahamannett Yes,

for k in range(nb_layers):
    try:
        g = f['layer_{}'.format(k)]
        weights = [g['param_{}'.format(p)] for p in range(g.attrs['nb_params'])]
        flattened_layers[k].set_weights(weights)
    except:
        print('Skip loading weight for layer_{}, ({})'.format(k, g))

Well, you're right. The weights from .h5 will be loaded into memory, but at least we save insert and pop operations.

jerpint commented 8 years ago

@joelthchao , loading the weights is not a problem. I have got the model running, what I am not sure now is how to fix weights. I feel like if I run my code as it is now, it will just update the entire system and lose the pre trained network values. Also, will fixing the weights decrease my run time?

Here is my code :

# input: 224x224 images with 3 channels -> (3, 224, 224) tensors.
# this applies 32 convolution filters of size 3x3 each.
model = Sequential()
model.add(ZeroPadding2D((1,1),input_shape=(3,224,224)))
model.add(Convolution2D(64, 3, 3, activation='relu'))
model.add(ZeroPadding2D((1,1)))
model.add(Convolution2D(64, 3, 3, activation='relu'))
model.add(MaxPooling2D((2,2), strides=(2,2)))

model.add(ZeroPadding2D((1,1)))
model.add(Convolution2D(128, 3, 3, activation='relu'))
model.add(ZeroPadding2D((1,1)))
model.add(Convolution2D(128, 3, 3, activation='relu'))
model.add(MaxPooling2D((2,2), strides=(2,2)))

model.add(ZeroPadding2D((1,1)))
model.add(Convolution2D(256, 3, 3, activation='relu'))
model.add(ZeroPadding2D((1,1)))
model.add(Convolution2D(256, 3, 3, activation='relu'))
model.add(ZeroPadding2D((1,1)))
model.add(Convolution2D(256, 3, 3, activation='relu'))
model.add(MaxPooling2D((2,2), strides=(2,2)))

model.add(ZeroPadding2D((1,1)))
model.add(Convolution2D(512, 3, 3, activation='relu'))
model.add(ZeroPadding2D((1,1)))
model.add(Convolution2D(512, 3, 3, activation='relu'))
model.add(ZeroPadding2D((1,1)))
model.add(Convolution2D(512, 3, 3, activation='relu'))
model.add(MaxPooling2D((2,2), strides=(2,2)))

model.add(ZeroPadding2D((1,1)))
model.add(Convolution2D(512, 3, 3, activation='relu'))
model.add(ZeroPadding2D((1,1)))
model.add(Convolution2D(512, 3, 3, activation='relu'))
model.add(ZeroPadding2D((1,1)))
model.add(Convolution2D(512, 3, 3, activation='relu'))
model.add(MaxPooling2D((2,2), strides=(2,2)))

model.add(Flatten())
model.add(Dense(4096, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(4096, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(1000, activation='softmax'))

# load the weights
model.load_weights('vgg16_weights.h5')

# pop last layer, insert my own
model.layers.pop()
model.add(Dropout(0.5))
model.add(Dense(len(Y_train[1])))
model.add(Activation('softmax'))

sgd = SGD(lr=0.01, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(loss='categorical_crossentropy', optimizer=sgd)
model.fit(X_train, Y_train, batch_size=32, nb_epoch=1
          ,show_accuracy=True)

jerpint commented 8 years ago

When running this, my loss is decreasing with time quite heavily. Granted, it is only for 1 epoch, but I would think it should converge rather quickly? I have 1500 training images over 15 labels (100 images per category)

joelthchao commented 8 years ago

@jerpint If you want to freeze weights of the model, you may need to check this issue #622. In my opinion, I think the phenomenon that loss decreasing heavily is quite normal, since your new Dense layer should be random initiated. In addition, 1500 is a relative small number for VGG16, you should do cross validation to prevent overfit.

grahamannett commented 8 years ago

I think when I had previously tried to use fine tuning, I did something like

for l in model.layers:
    l.trainable = False

and then added other layers. fwiw, i don't think it results in very good results from my small experiments I did

ternaus commented 8 years ago

As I understand, to change last layer, after uploading weight you can just pop it and replace by layer that you want. But let's say I want to change input layer? How may I do it?

For example if I would want smaller size images for faster prototyping?

kretes commented 7 years ago

This is something I struggled as well. After looking at keras code I used:

from keras.applications.vgg19 import VGG19
base_model = VGG19(include_top=False, weights='imagenet', input_tensor=Input((128,128,3)))
for layer in base_model.layers:
    layer.trainable = False
x = base_model.output
x = Flatten(name='flatten')(x)
x = Dense(16, activation='relu', name='fc1')(x)
x = Dense(1, activation='softmax', name='predictions')(x)

to get a vgg19 CNN layer with ready Convolution weights but with custom input and output layer to train

zafarali commented 7 years ago

@kretes code worked for me however, to use this as a model one must then do the following:

model = Model(input=base_model.input, output=x)

keras-team / keras

Finetune using Keras #871