Closed manmax31 closed 7 years ago
imo if it's a sequential model the layers and params are just in a list. Simply pop() the last activation function and layer and sets of params, add the new one on top and recompile. Worked for me.
@lemuriandezapada how do you pop the last layer?
You need to figure out what to pop, weights, activations and coresponding sets of params. But your model will have a model.layers and a model.params member. Then you can just model.layers.pop() and model.params.pop() a couple of times to remove your last entries. Add the new ones, recompile and you're set.
You can also just make a new model with the same structure up to the last layer, and set the same layers weights from your other model with the set_weights(), get_weights() functions.
Following this link and your advice, I did the following while keeping everything else the same
...
model.add(Dense(4096, 1000, activation='softmax'))
if weights_path:
model.load_weights(weights_path)
model.layers.pop()
model.params.pop()
model.add(Dense(4096, 47, activation='softmax'))
return model
I get the following error:
theano.gradient.DisconnectedInputError: grad method was asked to compute the gradient with respect to a variable that is not part of the computational graph of the cost, or is used only by a non-differentiable operator: <CudaNdarrayType(float32, matrix)>
What am I doing wrong?
if you list model.layers you'll see the last ones are one Dense and one Activation. Also the params has one matrix and one vector.
So 2 pops for each. Also I think the api changed and now you only pass layers the output size and they infer the input size by themselves.
If i'm not wrong it is not good to relearn the whole network, you should figure out a way to fix all the layers but the last one. Otherwise you will propagate the error due the random initialisation of the network to the other layers. There was a pull request for a flag to freeze the layers some days ago otherwise you need to remove the params from all the layers but the last one, something like that:
for i in range(len(model.param)-1):
model.layers[i].params = []
My implementation is ugly and probably it doesn't work out of the box, i never tried it directly but i think you got the point.
If you actually want to be thorough you will can just set the weight matrix for the output to oldmatrix[:,item1,item2,item3....item47] so you keep the weights for the classes you already learned. Or make one with 48 outputs and the 48th averages all the other classes you no longer care about.
Of course bdonadiman is right, this would relearn the entire network. Instead you can also just pop the last layers, recompile, compute the output of the next to last layer (or alternately make a theano function that just extracts it) and use that one as input to your new 0hiddenlayer classifier.
If i'm not wrong it is not good to relearn the whole network, you should figure out a way to fix all the layers but the last one.
Depends. If you have enough data, you'll generally get better results by retraining the entire network, with a very low learning rate. You want to move to the state of the parameter space (all network weights) that's optimal for your new dataset, under the assumption that you are already pretty close to that state, and therefore that you should be moving very slowly, at a small spatial scale.
@fchollet Can I use keras to set the learning rate of the last layer to higher than for other layers ?
Hi , I need to do what you are describing but I am not sure how. I am using the VGG-16 pretrained net for KERAS
I would like to go from outputting 1000 labels to 15, but would like to use the pretrained weights of the given model. What would be the best solution to do so? I am a beginner, so the more details the better :)
Essentially, I have this :
...
model.add(Dropout(0.5))
model.add(Dense(1000, activation='softmax'))
model.load_weights('vgg16_weights.h5')
but need to figure out how to change the last layer from output 1000, to 15. Obviously, just replacing 1000 by 15 it will generate an error
Thanks,
J
@jerpint load the entire model with weights, pop last 2 layers, add dropout + Dense(15) and compile.
@jerpint @grahamannett
In my case, I modify function load_weights
in engine/topology.py
to prevent error when trying to load weights into non-exist layers (your last layer) with try
and except
. Then you can directly modify network as
model.add(Dense(4096, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(15, activation='softmax'))
model.load_weights('vgg16_weights.h5')
Advantage is that you don't need to waste memory loading weights and remove them.
@joelthchao hmm are you referring to this? https://github.com/fchollet/keras/blob/master/keras/engine/topology.py#L2280
seem's like you will still be loading the weights but it's probably a more elegant solution.
@grahamannett Yes,
for k in range(nb_layers):
try:
g = f['layer_{}'.format(k)]
weights = [g['param_{}'.format(p)] for p in range(g.attrs['nb_params'])]
flattened_layers[k].set_weights(weights)
except:
print('Skip loading weight for layer_{}, ({})'.format(k, g))
Well, you're right. The weights from .h5
will be loaded into memory, but at least we save insert and pop operations.
@joelthchao , loading the weights is not a problem. I have got the model running, what I am not sure now is how to fix weights. I feel like if I run my code as it is now, it will just update the entire system and lose the pre trained network values. Also, will fixing the weights decrease my run time?
Here is my code :
# input: 224x224 images with 3 channels -> (3, 224, 224) tensors.
# this applies 32 convolution filters of size 3x3 each.
model = Sequential()
model.add(ZeroPadding2D((1,1),input_shape=(3,224,224)))
model.add(Convolution2D(64, 3, 3, activation='relu'))
model.add(ZeroPadding2D((1,1)))
model.add(Convolution2D(64, 3, 3, activation='relu'))
model.add(MaxPooling2D((2,2), strides=(2,2)))
model.add(ZeroPadding2D((1,1)))
model.add(Convolution2D(128, 3, 3, activation='relu'))
model.add(ZeroPadding2D((1,1)))
model.add(Convolution2D(128, 3, 3, activation='relu'))
model.add(MaxPooling2D((2,2), strides=(2,2)))
model.add(ZeroPadding2D((1,1)))
model.add(Convolution2D(256, 3, 3, activation='relu'))
model.add(ZeroPadding2D((1,1)))
model.add(Convolution2D(256, 3, 3, activation='relu'))
model.add(ZeroPadding2D((1,1)))
model.add(Convolution2D(256, 3, 3, activation='relu'))
model.add(MaxPooling2D((2,2), strides=(2,2)))
model.add(ZeroPadding2D((1,1)))
model.add(Convolution2D(512, 3, 3, activation='relu'))
model.add(ZeroPadding2D((1,1)))
model.add(Convolution2D(512, 3, 3, activation='relu'))
model.add(ZeroPadding2D((1,1)))
model.add(Convolution2D(512, 3, 3, activation='relu'))
model.add(MaxPooling2D((2,2), strides=(2,2)))
model.add(ZeroPadding2D((1,1)))
model.add(Convolution2D(512, 3, 3, activation='relu'))
model.add(ZeroPadding2D((1,1)))
model.add(Convolution2D(512, 3, 3, activation='relu'))
model.add(ZeroPadding2D((1,1)))
model.add(Convolution2D(512, 3, 3, activation='relu'))
model.add(MaxPooling2D((2,2), strides=(2,2)))
model.add(Flatten())
model.add(Dense(4096, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(4096, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(1000, activation='softmax'))
# load the weights
model.load_weights('vgg16_weights.h5')
# pop last layer, insert my own
model.layers.pop()
model.add(Dropout(0.5))
model.add(Dense(len(Y_train[1])))
model.add(Activation('softmax'))
sgd = SGD(lr=0.01, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(loss='categorical_crossentropy', optimizer=sgd)
model.fit(X_train, Y_train, batch_size=32, nb_epoch=1
,show_accuracy=True)
When running this, my loss is decreasing with time quite heavily. Granted, it is only for 1 epoch, but I would think it should converge rather quickly? I have 1500 training images over 15 labels (100 images per category)
@jerpint If you want to freeze weights of the model, you may need to check this issue #622. In my opinion, I think the phenomenon that loss decreasing heavily is quite normal, since your new Dense layer should be random initiated. In addition, 1500 is a relative small number for VGG16, you should do cross validation to prevent overfit.
I think when I had previously tried to use fine tuning, I did something like
for l in model.layers:
l.trainable = False
and then added other layers. fwiw, i don't think it results in very good results from my small experiments I did
As I understand, to change last layer, after uploading weight you can just pop it and replace by layer that you want. But let's say I want to change input layer? How may I do it?
For example if I would want smaller size images for faster prototyping?
This is something I struggled as well. After looking at keras code I used:
from keras.applications.vgg19 import VGG19
base_model = VGG19(include_top=False, weights='imagenet', input_tensor=Input((128,128,3)))
for layer in base_model.layers:
layer.trainable = False
x = base_model.output
x = Flatten(name='flatten')(x)
x = Dense(16, activation='relu', name='fc1')(x)
x = Dense(1, activation='softmax', name='predictions')(x)
to get a vgg19 CNN layer with ready Convolution weights but with custom input and output layer to train
@kretes code worked for me however, to use this as a model one must then do the following:
model = Model(input=base_model.input, output=x)
I have got the Keras'
VGG 16
model from https://gist.github.com/baraldilorenzo/07d7802847aaad0a35d3. For fine tuning incaffe
, we change the final layer's output to desired the number of classes and relearn the weights of the final layer. How do I do the same in Keras? For e.g: Instead of 1000 classes, I want now 47 classes.