rcmalli / keras-vggface

VGGFace implementation with Keras Framework
MIT License
935 stars 418 forks source link

How I finetune the last FC layer? #2

Closed Patriciasr92 closed 7 years ago

Patriciasr92 commented 7 years ago

First of all thank you for your efforts.

I am trying to use the VGGFace model to do a Facial Expression Recognition System. To do it I am trying to train only the last Fully Connected (FC) layer (the layer before the Softmax) with a dataset of 1576 images (8 classes * 197 pics per class).

I tried different approaches such as: 1) Separate the model in two submodels, the part of convolutions and the FC part. But then I can't merge them together. 2) Tried to freeze the Convs layers (Trainable=False) and train the FC layers. But i've got an error of dimensions.

This is what I have : http://stackoverflow.com/questions/40692495/keras-error-with-training-dimension-is-not-what-is-expected

I've really like to use your model (I've already got it implemented but the train stage doesn't seem to work) but if you can explain how to do the finetune and then merge the models I will always be grateful.

Thank you very much.

rcmalli commented 7 years ago

Follow these steps for easiest way to finetuning the only last layer:

1) Open your vggface.py source code and find the layer you want to customize. 2) Change the parameters and name of that layer. 3) Find the model.load_weights(path) function in the source code for your suitable backend and dimension ordering and change it to model.load_weights(path, by_name=True). 4) Freeze other layers like you mentioned. 5) Start training.

I hope this solution will help you.

Patriciasr92 commented 7 years ago

First, thank you for your answer.

I've tried the steps that you suggested but I have the following error:

File "freeze_4.py", line 118, in finetune model.fit(train_data,label, nb_epoch=nb_epoch, batch_size=64) File "/imatge/psereno/workspace/venv-tfg/local/lib/python2.7/site-packages/keras/engine/training.py", line 1057, in fit batch_size=batch_size) File "/imatge/psereno/workspace/venv-tfg/local/lib/python2.7/site-packages/keras/engine/training.py", line 984, in _standardize_user_data exception_prefix='model input') File "/imatge/psereno/workspace/venv-tfg/local/lib/python2.7/site-packages/keras/engine/training.py", line 100, in standardize_input_data str(array.shape)) Exception: Error when checking model input: expected input_2 to have 4 dimensions, but got array with shape (1576, 8) Exception in thread Thread-20: Traceback (most recent call last): File "/usr/lib/python2.7/threading.py", line 810, in bootstrap_inner self.run() File "/usr/lib/python2.7/threading.py", line 763, in run self.__target(_self.args, *_self.__kwargs) File "/imatge/psereno/workspace/venv-tfg/local/lib/python2.7/site-packages/keras/engine/training.py", line 435, in data_generator_task generator_output = next(generator) File "/imatge/psereno/workspace/venv-tfg/local/lib/python2.7/site-packages/keras/preprocessing/image.py", line 604, in next x = img_to_array(img, dim_ordering=self.dim_ordering) TypeError: 'NoneType' object is not callable

Here is the code I am using:

    from keras.models import Model
    from keras.layers import Input, Convolution2D, ZeroPadding2D, MaxPooling2D, Flatten, Dropout, Activation
    from keras.models import Sequential
    from keras.layers.core import Flatten, Dense, Dropout
    from keras.optimizers import SGD
    import cv2, numpy as np
    from PIL import Image
    from keras.preprocessing.image import ImageDataGenerator

'''I am using that configuration /.keras/keras.json
{
    "image_dim_ordering": "th",
    "epsilon": 1e-07,
    "floatx": "float32",
    "backend": "tensorflow"
}
'''

def finetune(weights_path=None):

    img_width, img_height = 224, 224
    img = Input(shape=(3, img_height, img_width))
    datagen = ImageDataGenerator(rescale=1.)
    generator = datagen.flow_from_directory(train_data_dir,
                                        target_size=(img_width, img_height),
                                        batch_size=32,
                                        class_mode=None,
                                        shuffle=False)

    pad1_1 = ZeroPadding2D(padding=(1, 1), trainable=False, name='in_train', input_shape=(3,224,224))(img)
    conv1_1 = Convolution2D(64, 3, 3, activation='relu', name='conv1_1', trainable=False)(pad1_1)
    pad1_2 = ZeroPadding2D(padding=(1, 1), trainable=False)(conv1_1)
    conv1_2 = Convolution2D(64, 3, 3, activation='relu', name='conv1_2', trainable=False)(pad1_2)
    pool1 = MaxPooling2D((2, 2), strides=(2, 2), trainable=False)(conv1_2)

    pad2_1 = ZeroPadding2D((1, 1), trainable=False)(pool1)
    conv2_1 = Convolution2D(128, 3, 3, activation='relu', name='conv2_1', trainable=False)(pad2_1)
    pad2_2 = ZeroPadding2D((1, 1), trainable=False)(conv2_1)
    conv2_2 = Convolution2D(128, 3, 3, activation='relu', name='conv2_2', trainable=False)(pad2_2)
    pool2 = MaxPooling2D((2, 2), strides=(2, 2), trainable=False)(conv2_2)

    pad3_1 = ZeroPadding2D((1, 1), trainable=False)(pool2)
    conv3_1 = Convolution2D(256, 3, 3, activation='relu', name='conv3_1', trainable=False)(pad3_1)
    pad3_2 = ZeroPadding2D((1, 1), trainable=False)(conv3_1)
    conv3_2 = Convolution2D(256, 3, 3, activation='relu', name='conv3_2', trainable=False)(pad3_2)
    pad3_3 = ZeroPadding2D((1, 1), trainable=False)(conv3_2)
    conv3_3 = Convolution2D(256, 3, 3, activation='relu', name='conv3_3', trainable=False)(pad3_3)
    pool3 = MaxPooling2D((2, 2), strides=(2, 2), trainable=False)(conv3_3)

    pad4_1 = ZeroPadding2D((1, 1), trainable=False)(pool3)
    conv4_1 = Convolution2D(512, 3, 3, activation='relu', name='conv4_1', trainable=False)(pad4_1)
    pad4_2 = ZeroPadding2D((1, 1), trainable=False)(conv4_1)
    conv4_2 = Convolution2D(512, 3, 3, activation='relu', name='conv4_2', trainable=False)(pad4_2)
    pad4_3 = ZeroPadding2D((1, 1), trainable=False)(conv4_2)
    conv4_3 = Convolution2D(512, 3, 3, activation='relu', name='conv4_3', trainable=False)(pad4_3)
    pool4 = MaxPooling2D((2, 2), strides=(2, 2), trainable=False)(conv4_3)

    pad5_1 = ZeroPadding2D((1, 1), trainable=False)(pool4)
    conv5_1 = Convolution2D(512, 3, 3, activation='relu', name='conv5_1', trainable=False)(pad5_1)
    pad5_2 = ZeroPadding2D((1, 1), trainable=False)(conv5_1)
    conv5_2 = Convolution2D(512, 3, 3, activation='relu', name='conv5_2', trainable=False)(pad5_2)
    pad5_3 = ZeroPadding2D((1, 1), trainable=False)(conv5_2)
    conv5_3 = Convolution2D(512, 3, 3, activation='relu', name='conv5_3', trainable=False)(pad5_3)
    pool5 = MaxPooling2D((2, 2), strides=(2, 2), trainable=False)(conv5_3)

    fc6 = Convolution2D(4096, 7, 7, activation='relu', name='fc6', trainable=False)(pool5)
    fc6_drop = Dropout(0.5)(fc6)
    # Train that layer
    fc7_n = Convolution2D(4096, 1, 1, activation='relu', name='fc7_n', trainable=True)(fc6_drop)
    fc7_drop_n = Dropout(0.5)(fc7_n)
    fc8_n = Convolution2D(8, 1, 1, name='fc8_n', trainable=True)(fc7_drop_n)
    flat_n = Flatten(name='flat_n')(fc8_n)
    out_n = Activation('softmax')(flat_n)

    model = Model(input=img, output=out_n)
    model.summary()

    if weights_path:
        model.load_weights(weights_path, by_name=True)

    bottleneck_features_train = model.predict_generator(generator, nb_train_samples)
    np.save(open('features.npy', 'w'), bottleneck_features_train)
    train_data = np.load(open('features.npy'))
    print(train_data.shape)  #I've got (1576,8)

    train_labels = np.array(
    [0] * (nb_train_samples / 8) + [1] * (nb_train_samples / 8) + [2] * (nb_train_samples / 8) + [3] * (
        nb_train_samples / 8) + [4] * (nb_train_samples / 8) + [5] * (nb_train_samples / 8) + [6] * (
        nb_train_samples / 8) + [7] * (nb_train_samples / 8))

    lbl1 = np.array([[1, 0, 0, 0, 0, 0, 0, 0], ] * 197)
    lbl2 = np.array([[0, 1, 0, 0, 0, 0, 0, 0], ] * 197)
    lbl3 = np.array([[0, 0, 1, 0, 0, 0, 0, 0], ] * 197)
    lbl4 = np.array([[0, 0, 0, 1, 0, 0, 0, 0], ] * 197)
    lbl5 = np.array([[0, 0, 0, 0, 1, 0, 0, 0], ] * 197)
    lbl6 = np.array([[0, 0, 0, 0, 0, 1, 0, 0], ] * 197)
    lbl7 = np.array([[0, 0, 0, 0, 0, 0, 1, 0], ] * 197)
    lbl8 = np.array([[0, 0, 0, 0, 0, 0, 0, 1], ] * 197)
    label = np.concatenate([lbl1, lbl2, lbl3, lbl4, lbl5, lbl6, lbl7, lbl8])
    '''train_labels --> loss='sparse_categorical_crossentropy'
           labels --> loss='categorical_crossentropy'
    '''

    sgd = SGD(lr=0.1, decay=1e-6, momentum=0.9, nesterov=True)
    model.compile(optimizer=sgd, loss='categorical_crossentropy', metrics=['accuracy'])

    model.fit(train_data,label, nb_epoch=nb_epoch, batch_size=64)
    print('Model Trained')

    #We save the weights so we can load them in our model
    model.save_weights(pesos_entrenados)  # always save your weights after training or during training
    return model

if __name__ == "__main__":
    im = Image.open('A.J._Buckley.jpg')
    im = im.resize((224, 224))
    im = np.array(im).astype(np.float32)
    im = im.transpose((2, 0, 1))
    im = np.expand_dims(im, axis=0)

    # For the training stage
    img_width, img_height = 224, 224
    img = Input(shape=(3, img_height, img_width))
    train_data_dir = 'merge/train'
    pesos_entrenados='Modelo_Reentrenado.h5'
    # validation_data_dir = 'data/validation'
    nb_train_samples = 1576  # 197 per class and we have 8 classes (8 emotions)
    nb_validation_samples = 0
    nb_epoch = 20

    model=finetune('vggface_weights_tensorflow.h5')  #Construction of the model
    #model.summary()

    out = model.predict(im)
    print(out[0][0])

If you can help me i'll be really grateful.

anjith2006 commented 7 years ago

Hi, Can you explain the preprocessing (face detection and/or alignment) needed before extracting the features from VGGFace network.

hbredin commented 7 years ago

I am the author of pyannote.video and would like to get rid the Openface lua dependency, and switch to keras-vggface.

The last thing preventing me from switching to keras-vggface is an explanation on how to do face detection & alignment. Did you use openface implementation?

rcmalli commented 7 years ago

Sorry for late responses,

@Patriciasr92 You input shape seems not correct. It should be 4D input vector, but you feed the network with 2D input.

@anjith2006 , @hbredin This repositority is just based on weight conversion from Caffe to Keras. All face preprocessing steps should follow the original paper. Implementations of the preprocessing steps are welcomed.