keras-team / keras

Deep Learning for humans
http://keras.io/
Apache License 2.0
62.09k stars 19.48k forks source link

how can I use center loss in keras? #6929

Closed SpenserCai closed 6 years ago

SpenserCai commented 7 years ago

how can I use center loss in keras?

wangxianliang commented 7 years ago

I just implemented the loss, maybe you can have it a try.

#!/usr/bin/python
#_*_ coding:utf8 _*_
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import functools
from keras import backend as K
import tensorflow as tf

def _center_loss_func(features, labels, alpha, num_classes):
    feature_dim = features.get_shape()[1]
    # Each output layer use one independed center: scope/centers
    centers = K.zeros([num_classes, feature_dim])
    labels = K.reshape(labels, [-1])
    labels = tf.to_int32(labels)
    centers_batch = tf.gather(centers, labels)
    diff = (1 - alpha) * (centers_batch - features)
    centers = tf.scatter_sub(centers, labels, diff)
    loss = tf.reduce_mean(K.square(features - centers_batch))
    return loss

def get_center_loss(alpha, num_classes):
    """Center loss based on the paper "A Discriminative 
       Feature Learning Approach for Deep Face Recognition"
       (http://ydwen.github.io/papers/WenECCV16.pdf)
    """
    @functools.wraps(_center_loss_func)
    def center_loss(y_true, y_pred):
        return _center_loss_func(y_pred, y_true, alpha, num_classes)
    return center_loss

usage:

center_loss = get_center_loss(0.5, num_classes)
model.compile(optimizer='sgd', loss = center_loss)
...
sun9700 commented 7 years ago

@wangxianliang K.zeros([num_classes, feature_dim]) always set centers to zeros def _center_loss_func(features, labels, alpha, num_classes): feature_dim = features.get_shape()[1]

Each output layer use one independed center: scope/centers

centers = K.zeros([num_classes, feature_dim])
**centers= centers+1**
**loss = tf.reduce_mean(centers)**
return loss

always outputs 1

even centers = tf.get_variable('centersl', [num_classes, feature_dim], dtype=tf.float32, initializer=tf.constant_initializer(0), trainable=False) tf.get_variable_scope().reuse_variables() dose not work

@fchollet How dose keras save a reused variable and update it by myself according to inputs and model predicitons . I have tried layers(like ocr example), loss functions, regulator, callback.

wangxianliang commented 7 years ago
def _center_loss_func(features, labels, alpha, num_classes,
                      centers, feature_dim):
    assert feature_dim == features.get_shape()[1]    
    labels = K.reshape(labels, [-1])
    labels = tf.to_int32(labels)
    centers_batch = tf.gather(centers, labels)
    diff = (1 - alpha) * (centers_batch - features)
    centers = tf.scatter_sub(centers, labels, diff)
    loss = tf.reduce_mean(K.square(features - centers_batch))
    return loss

def get_center_loss(alpha, num_classes, feature_dim):
    """Center loss based on the paper "A Discriminative 
       Feature Learning Approach for Deep Face Recognition"
       (http://ydwen.github.io/papers/WenECCV16.pdf)
    """    
    # Each output layer use one independed center: scope/centers
    centers = K.zeros([num_classes, feature_dim])
    @functools.wraps(_center_loss_func)
    def center_loss(y_true, y_pred):
        return _center_loss_func(y_pred, y_true, alpha, 
                                 num_classes, centers, feature_dim)
    return center_loss
sun9700 commented 7 years ago

@wangxianliang Dose it update centers when testing?

wyxpku commented 7 years ago

The implementation above calculate center loss on model's output layer, but how to get center loss on feature layer?

JihoonJ commented 7 years ago

@wyxpku If you want to get both of the center loss and cross entropoy loss, you can add the feature layer as the output of model.

# the 'features' layer should be defined in a network
 model = Model(X, [y, features], name=name)

In this case, you should fit the input corresponding to the outputs, and I cloned the ouput.

def clone_y_generator(generator):
    # output: train_gen_X, [train_gen_Y, train_gen_Y]
    while True:
        data = next(generator)
        x = data[0]
        y = [data[1], data[1]]
        yield x, y

usage:

train_gener = train_gen.flow_from_directory(train_dir, ...     )
self.model.fit_generator(clone_y_generator(train_gener), ...)
wyxpku commented 7 years ago

Thank you for your answer @JihoonJ

stale[bot] commented 7 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 30 days if no further activity occurs, but feel free to re-open a closed issue if needed.

ChristopherLu commented 7 years ago

@JihoonJ @wyxpku

Just want to confirm, if I put the center loss in the feature layer, will the center gets updated in every step?

kutoga commented 7 years ago

No, it doesn't seem to work. I did some tests and this center-loss implementation doesn't update the centers.

The following command should be executed after every update: centers = tf.scatter_sub(centers, labels, diff) E.g. like in this example of a tensorflow-implementation of the center-loss: https://github.com/EncodeTS/TensorFlow_Center_Loss/blob/master/mnist_sample_code/mnist_with_center_loss.ipynb

It is done like this (see the section "Optimizer"):

with tf.control_dependencies([centers_update_op]):
    train_op = optimizer.minimize(total_loss, global_step=global_step)

My question therefore is: Can this be done somehow with keras?

Thank you very much

fabiocapsouza commented 7 years ago

I've tested the center loss implementation given by @wangxianliang with MNIST and it works somehow, since the results are quite different from the results of a model that uses only cross-entropy loss. It's possible to see in the image below that the different classes are indeed clustered around their corresponding centers in the 2D space. lenetpp_center_loss_frw_few_samples_es_rlrop_lambda_3e-2

However, I'm facing problems with this implementation when trying to resume the training from a saved checkpoint, because I can't get the values of the centers after the training, that are needed to instantiate a center loss function with centers different from zero. Any ideas on how to get the value after the training and save it?

Another possibility would be to move the centers variable to a layer or an optimizer, but I have no clue how to do it.

artkorenev commented 7 years ago

@kutoga this can be solved if you define the loss not via function but via defining your own layer as a subclass of keras.engine.topology.Layer. In this case, when you call tf.scatter_sub (or K.update_sub if you prefer more Keras-way approach) you obtain an operation that can be placed in the computational graph using self.add_update function in your layer implementation. This will update your proxies but you can see now that your loss is actually a Layer and therefore you need to provide some additional dummy loss function that will just pass value from center loss layer forward so the gradients will be calculated properly for the whole network.

kjanjua26 commented 6 years ago

@fabiocapsouza, could you please share the exact code of how you got this working?

fabiocapsouza commented 6 years ago

@kjanjua26 unfortunately I can't share the exact code because it belongs to the company I used to work at the time. But I used the functions that @wangxianliang gave in the 3rd post of this thread.

kjanjua26 commented 6 years ago

@fabiocapsouza, I used the exact same function but the issue is it gives shape error for the first code he wrote, 128,10 vs 1280,10. I am not sure how to resolve that.

djzurawski commented 6 years ago

@kjanjua26 I had the same problem. I found if you're using categorical labels (one hot encoding) changing the labels = K.reshape(labels, [-1]) line to labels = tf.argmax(labels, axis=1) fixed it.

wt-huang commented 6 years ago

Closing as this is resolved

wangjue-wzq commented 5 years ago

@kjanjua26 I had the same problem. I found if you're using categorical labels (one hot encoding) changing the labels = K.reshape(labels, [-1]) line to labels = tf.argmax(labels, axis=1) fixed it.

Thank you very much!But I don't konw why?

adriaciurana commented 5 years ago

@kjanjua26 I had the same problem. I found if you're using categorical labels (one hot encoding) changing the labels = K.reshape(labels, [-1]) line to labels = tf.argmax(labels, axis=1) fixed it.

Thank you very much!But I don't konw why?

Because labels comes from the "Y" target you're using. In this case, being classified, you have it encoded as a one-hot-encoding. The _center_loss_func really receives:

The problem that i find in this kind of implementation is that in validation split the center will be move. I dont know if the are any flag/control to know if the loss function is in the validation step.

wangjue-wzq commented 5 years ago

@kjanjua26 I had the same problem. I found if you're using categorical labels (one hot encoding) changing the labels = K.reshape(labels, [-1]) line to labels = tf.argmax(labels, axis=1) fixed it.

Thank you very much!But I don't konw why?

Because labels comes from the "Y" target you're using. In this case, being classified, you have it encoded as a one-hot-encoding. The _center_loss_func really receives:

  • features: previous layer to the softmax => X dimensions.
  • labels: one-hot-encoding of your target, if you have 100 classes it will be the vector 0 ... 1 ... in the 1 in the class corresponding to that sample. For that reason if you take the position of the maximum value (value 1), you know what class it is.
  • alpha: controls the speed that the centroid is updated.
  • num_classes: number of classes you use (It would be equivalent to doing labels.get_shape () [1]).

The problem that i find in this kind of implementation is that in validation split the center will be move. I dont know if the are any flag/control to know if the loss function is in the validation step.

I did this in image scene classification, but the model has no performance boost compare with softmax loss.Looking forward to your reply!

adriaciurana commented 5 years ago

I could be several reasons. I have done the following: I added a line of code to make sure that the center_loss was always with the current center (if i dont do that i have problems with l2norm).

def _center_loss_func(features, labels, alpha, num_classes, centers, feature_dim):
    assert feature_dim == features.get_shape()[1]    
    labels = K.argmax(labels, axis=1)
    labels = tf.to_int32(labels)
    centers_batch = K.gather(centers, labels)
    diff = (1 - alpha) * (centers_batch - features)
    centers = tf.scatter_sub(centers, labels, diff)
    centers_batch = K.gather(centers, labels)
    loss = K.mean(K.square(features - centers_batch))
    return loss

def get_center_loss(alpha, num_classes, feature_dim):
    """Center loss based on the paper "A Discriminative 
       Feature Learning Approach for Deep Face Recognition"
       (http://ydwen.github.io/papers/WenECCV16.pdf)
    """    
    # Each output layer use one independed center: scope/centers
    centers = K.zeros([num_classes, feature_dim], dtype='float32')
    @functools.wraps(_center_loss_func)
    def center_loss(y_true, y_pred):
        return _center_loss_func(y_pred, y_true, alpha, num_classes, centers, feature_dim)
    return center_loss

It may be due to different reasons perhaps you have to touch the importance it has regarding the softmax. That can be done with loss_weights (0.01 for me).

For my part I have added an l2_norm in the features, that way they will always be on the same scale. In addition to being proportional both the Euclidean distance and the Cosine distance (https://stats.stackexchange.com/questions/146221/is-cosine-similarity-identical-to-l2-normalized-euclidean-distance).

wangjue-wzq commented 5 years ago

I could be several reasons. I have done the following: I added a line of code to make sure that the center_loss was always with the current center (if i dont do that i have problems with l2norm).

def _center_loss_func(features, labels, alpha, num_classes, centers, feature_dim):
    assert feature_dim == features.get_shape()[1]    
    labels = K.argmax(labels, axis=1)
    labels = tf.to_int32(labels)
    centers_batch = K.gather(centers, labels)
    diff = (1 - alpha) * (centers_batch - features)
    centers = tf.scatter_sub(centers, labels, diff)
    centers_batch = K.gather(centers, labels)
    loss = K.mean(K.square(features - centers_batch))
    return loss

def get_center_loss(alpha, num_classes, feature_dim):
    """Center loss based on the paper "A Discriminative 
       Feature Learning Approach for Deep Face Recognition"
       (http://ydwen.github.io/papers/WenECCV16.pdf)
    """    
    # Each output layer use one independed center: scope/centers
    centers = K.zeros([num_classes, feature_dim], dtype='float32')
    @functools.wraps(_center_loss_func)
    def center_loss(y_true, y_pred):
        return _center_loss_func(y_pred, y_true, alpha, num_classes, centers, feature_dim)
    return center_loss

It may be due to different reasons perhaps you have to touch the importance it has regarding the softmax. That can be done with loss_weights (0.01 for me).

For my part I have added an l2_norm in the features, that way they will always be on the same scale. In addition to being proportional both the Euclidean distance and the Cosine distance (https://stats.stackexchange.com/questions/146221/is-cosine-similarity-identical-to-l2-normalized-euclidean-distance).

I may have problems creating the VGG16 model. The feature that requires full connectivity layer output when calculating centerloss. My code is written like this, but I don't know where the error is, can you help me?

model = VGG16(input_tensor=image_input, include_top=False,weights='imagenet')
model.summary()
last_layer = model.layers[-1].output
x = Flatten(name='flatten')(last_layer)
x = Dense(4096, activation='relu', name='fc1')(x)
x = Dense(4096, activation='relu', name='fc2')(x)
x = Dense(num_classes,activation = 'softmax',name='predication')(x)
custom_vgg_model = Model(inputs = image_input, outputs = x)
custom_vgg_model.summary()

for layer in custom_vgg_model.layers[:-1]:
    layer.trainable = False

custom_vgg_model.layers[3].trainable

sgd = optimizers.SGD(lr=learn_Rate,decay=decay_Rate,momentum=0.9,nesterov=True)
total_loss = center_loss(alpha=0.5,lambda_c=0.01,num_classes=num_classes)
custom_vgg_model.compile(loss="categorical_crossentropy",optimizer= sgd,metrics=['accuracy'])
adriaciurana commented 5 years ago

I could be several reasons. I have done the following: I added a line of code to make sure that the center_loss was always with the current center (if i dont do that i have problems with l2norm).

def _center_loss_func(features, labels, alpha, num_classes, centers, feature_dim):
    assert feature_dim == features.get_shape()[1]    
    labels = K.argmax(labels, axis=1)
    labels = tf.to_int32(labels)
    centers_batch = K.gather(centers, labels)
    diff = (1 - alpha) * (centers_batch - features)
    centers = tf.scatter_sub(centers, labels, diff)
    centers_batch = K.gather(centers, labels)
    loss = K.mean(K.square(features - centers_batch))
    return loss

def get_center_loss(alpha, num_classes, feature_dim):
    """Center loss based on the paper "A Discriminative 
       Feature Learning Approach for Deep Face Recognition"
       (http://ydwen.github.io/papers/WenECCV16.pdf)
    """    
    # Each output layer use one independed center: scope/centers
    centers = K.zeros([num_classes, feature_dim], dtype='float32')
    @functools.wraps(_center_loss_func)
    def center_loss(y_true, y_pred):
        return _center_loss_func(y_pred, y_true, alpha, num_classes, centers, feature_dim)
    return center_loss

It may be due to different reasons perhaps you have to touch the importance it has regarding the softmax. That can be done with loss_weights (0.01 for me). For my part I have added an l2_norm in the features, that way they will always be on the same scale. In addition to being proportional both the Euclidean distance and the Cosine distance (https://stats.stackexchange.com/questions/146221/is-cosine-similarity-identical-to-l2-normalized-euclidean-distance).

I may have problems creating the VGG16 model. The feature that requires full connectivity layer output when calculating centerloss. My code is written like this, but I don't know where the error is, can you help me?

model = VGG16(input_tensor=image_input, include_top=False,weights='imagenet')
model.summary()
last_layer = model.layers[-1].output
x = Flatten(name='flatten')(last_layer)
x = Dense(4096, activation='relu', name='fc1')(x)
x = Dense(4096, activation='relu', name='fc2')(x)
x = Dense(num_classes,activation = 'softmax',name='predication')(x)
custom_vgg_model = Model(inputs = image_input, outputs = x)
custom_vgg_model.summary()

for layer in custom_vgg_model.layers[:-1]:
    layer.trainable = False

custom_vgg_model.layers[3].trainable

sgd = optimizers.SGD(lr=learn_Rate,decay=decay_Rate,momentum=0.9,nesterov=True)
total_loss = center_loss(alpha=0.5,lambda_c=0.01,num_classes=num_classes)
custom_vgg_model.compile(loss="categorical_crossentropy",optimizer= sgd,metrics=['accuracy'])

You must enter the center_loss within the losses, otherwise it will not take effect. You must also specify which layer you are interested in, which are your "feature vectors". It would be something similar to this (for my part I prefer to always add an l2_norm in the features to control the scale):

model = VGG16(input_tensor=image_input, include_top=False,weights='imagenet')
model.summary()
last_layer = model.layers[-1].output
x = Flatten(name='flatten')(last_layer)
x = Dense(4096, activation='relu', name='fc1')(x)
x = Dense(4096, activation='relu', name='fc2')(x)
features = x
x = Dense(num_classes,activation = 'softmax',name='predication')(x)
custom_vgg_model = Model(inputs = image_input, outputs = [x, features])
custom_vgg_model.summary()

for layer in custom_vgg_model.layers[:-1]:
    layer.trainable = False

custom_vgg_model.layers[3].trainable

sgd = optimizers.SGD(lr=learn_Rate,decay=decay_Rate,momentum=0.9,nesterov=True)
total_loss = center_loss(alpha=0.5,lambda_c=0.01,num_classes=num_classes)
custom_vgg_model.compile(loss={'predication': "categorical_crossentropy", 'fc2': total_loss},loss_weights={'fc2': 1, 'predication': 1},optimizer= sgd,metrics={'predication': 'accuracy'})

You should play a little with the weight of both losses to force or not the compression of the clusters.

When you send the data to perform the learning you will see that you have two outputs, do not worry, the second has no use whatsoever. The solution is to send the replicated Y that is: If you use fit(X, [Y, Y]) If you use generator yield Xdata, [Ydata, Ydata]

I hope it has been helpful for you

wangjue-wzq commented 5 years ago

I could be several reasons. I have done the following: I added a line of code to make sure that the center_loss was always with the current center (if i dont do that i have problems with l2norm).

def _center_loss_func(features, labels, alpha, num_classes, centers, feature_dim):
    assert feature_dim == features.get_shape()[1]    
    labels = K.argmax(labels, axis=1)
    labels = tf.to_int32(labels)
    centers_batch = K.gather(centers, labels)
    diff = (1 - alpha) * (centers_batch - features)
    centers = tf.scatter_sub(centers, labels, diff)
    centers_batch = K.gather(centers, labels)
    loss = K.mean(K.square(features - centers_batch))
    return loss

def get_center_loss(alpha, num_classes, feature_dim):
    """Center loss based on the paper "A Discriminative 
       Feature Learning Approach for Deep Face Recognition"
       (http://ydwen.github.io/papers/WenECCV16.pdf)
    """    
    # Each output layer use one independed center: scope/centers
    centers = K.zeros([num_classes, feature_dim], dtype='float32')
    @functools.wraps(_center_loss_func)
    def center_loss(y_true, y_pred):
        return _center_loss_func(y_pred, y_true, alpha, num_classes, centers, feature_dim)
    return center_loss

It may be due to different reasons perhaps you have to touch the importance it has regarding the softmax. That can be done with loss_weights (0.01 for me). For my part I have added an l2_norm in the features, that way they will always be on the same scale. In addition to being proportional both the Euclidean distance and the Cosine distance (https://stats.stackexchange.com/questions/146221/is-cosine-similarity-identical-to-l2-normalized-euclidean-distance).

I may have problems creating the VGG16 model. The feature that requires full connectivity layer output when calculating centerloss. My code is written like this, but I don't know where the error is, can you help me?

model = VGG16(input_tensor=image_input, include_top=False,weights='imagenet')
model.summary()
last_layer = model.layers[-1].output
x = Flatten(name='flatten')(last_layer)
x = Dense(4096, activation='relu', name='fc1')(x)
x = Dense(4096, activation='relu', name='fc2')(x)
x = Dense(num_classes,activation = 'softmax',name='predication')(x)
custom_vgg_model = Model(inputs = image_input, outputs = x)
custom_vgg_model.summary()

for layer in custom_vgg_model.layers[:-1]:
    layer.trainable = False

custom_vgg_model.layers[3].trainable

sgd = optimizers.SGD(lr=learn_Rate,decay=decay_Rate,momentum=0.9,nesterov=True)
total_loss = center_loss(alpha=0.5,lambda_c=0.01,num_classes=num_classes)
custom_vgg_model.compile(loss="categorical_crossentropy",optimizer= sgd,metrics=['accuracy'])

You must enter the center_loss within the losses, otherwise it will not take effect. You must also specify which layer you are interested in, which are your "feature vectors". It would be something similar to this (for my part I prefer to always add an l2_norm in the features to control the scale):

model = VGG16(input_tensor=image_input, include_top=False,weights='imagenet')
model.summary()
last_layer = model.layers[-1].output
x = Flatten(name='flatten')(last_layer)
x = Dense(4096, activation='relu', name='fc1')(x)
x = Dense(4096, activation='relu', name='fc2')(x)
features = x
x = Dense(num_classes,activation = 'softmax',name='predication')(x)
custom_vgg_model = Model(inputs = image_input, outputs = [x, features])
custom_vgg_model.summary()

for layer in custom_vgg_model.layers[:-1]:
    layer.trainable = False

custom_vgg_model.layers[3].trainable

sgd = optimizers.SGD(lr=learn_Rate,decay=decay_Rate,momentum=0.9,nesterov=True)
total_loss = center_loss(alpha=0.5,lambda_c=0.01,num_classes=num_classes)
custom_vgg_model.compile(loss={'predication': "categorical_crossentropy", 'fc2': total_loss},loss_weights={'fc2': 1, 'predication': 1},optimizer= sgd,metrics={'predication': 'accuracy'})

You should play a little with the weight of both losses to force or not the compression of the clusters.

When you send the data to perform the learning you will see that you have two outputs, do not worry, the second has no use whatsoever. The solution is to send the replicated Y that is: If you use fit(X, [Y, Y]) If you use generator yield Xdata, [Ydata, Ydata]

I hope it has been helpful for you

I have a new problem when model.fit(),the error is that ValueError: The model expects 2 target arrays, but only received one array. Found: array with shape (25200, 45) Hope to receive your reply

model = VGG16(input_tensor=imageinput, include_top=True,weights='imagenet')
last_layer = model.get_layer('fc2').output
feature = last_layer
out = Dense(num_classes,activation = 'softmax',name='predictions')(last_layer)
custom_vgg_model = Model(inputs = image_input, outputs = [out,feature])
for layer in custom_vgg_model.layers[:-3]:
    layer.trainable = False 
custom_vgg_model.layers[3].trainable 
sgd = optimizers.SGD(lr=lr,decay=decay,momentum=0.9,nesterov=True)
custom_vgg_model.compile(loss={'predictions': "categorical_crossentropy", 'fc2': "total_loss"},loss_weights={'fc2': 1, 'predictions': 1},optimizer= sgd,metrics={'predictions': 'accuracy'})
hist = custom_vgg_model.fit(x = X_train, y = y_train, batch_size=batch_Sizes, epochs=epoch_Times, verbose=1, validation_data=(X_test, y_test))

# losses.py
def get_center_loss(labels,features, alpha,lambda_c,num_classes):
    len_features = features.get_shape()[1]
    try:
        with tf.variable_scope('v_center',reuse = True):
            centers = tf.get_variable('centers', [num_classes, len_features], dtype=tf.float32,
                           initializer=tf.constant_initializer(0), trainable=False)
    except:
        with tf.variable_scope('v_center',reuse = False):
            centers = tf.get_variable('centers', [num_classes, len_features], dtype=tf.float32,
                                  initializer=tf.constant_initializer(0), trainable=False)
    labels = tf.argmax(labels, axis=1)
    labels = tf.to_int64(labels)
    center_loss = tf.reduce_mean(tf.square(features - centers_batch))
    diffs = (features[:, tf.newaxis] - centers_batch[tf.newaxis, :])
    # 
    centers_update_op = tf.scatter_sub(centers, labels, diff)  # diff is used to get updated centers.
    with tf.control_dependencies([centers_update_op]):
    # combo_loss = value_factor * center_loss + new_factor * git_loss
        combo_loss = lambda_c * center_loss

    return combo_loss

def total_loss(y_true,y_pred):
        center_loss = get_center_loss(y_true,y_pred,alpha=0.5,lambda_c=0.01,num_classes=45)
        return center_loss 
adriaciurana commented 5 years ago

hist = custom_vgg_model.fit(x = X_train, y = y_train, batch_size=batch_Sizes, epochs=epoch_Times, verbose=1, validation_data=(X_test, y_test))

When applying the central loss what you are really forcing is that the network returns 2 targets (applying this method).

  1. first target of the network refers to the softmax (prediction) and the classification problem.

  2. second target is your features (fc2) and the image retrieval problem that you really want to solve.

When you do .fit, you must send two "Y", because one will work on target 1 and the other on target 2.

What happens is that the center_loss requires to know to what class it belongs (to correct only this centroid). For this reason you have to duplicate y_train and y_test:

hist = custom_vgg_model.fit (x = X_train, y = [y_train, y_train], batch_size = batch_Sizes, epochs = epoch_Times, verbose = 1, validation_data = (X_test, [y_test, y_test]))

Once you want to generate the final model you only have to do:

final_model = Model (inputs = custom_vgg_model.inputs, outputs = custom_vgg_model.get_layer ('fc2'). output)
wangjue-wzq commented 5 years ago

hist = custom_vgg_model.fit(x = X_train, y = y_train, batch_size=batch_Sizes, epochs=epoch_Times, verbose=1, validation_data=(X_test, y_test))

When applying the central loss what you are really forcing is that the network returns 2 targets (applying this method).

  1. first target of the network refers to the softmax (prediction) and the classification problem.
  2. second target is your features (fc2) and the image retrieval problem that you really want to solve.

When you do .fit, you must send two "Y", because one will work on target 1 and the other on target 2.

What happens is that the center_loss requires to know to what class it belongs (to correct only this centroid). For this reason you have to duplicate y_train and y_test:

hist = custom_vgg_model.fit (x = X_train, y = [y_train, y_train], batch_size = batch_Sizes, epochs = epoch_Times, verbose = 1, validation_data = (X_test, [y_test, y_test]))

Once you want to generate the final model you only have to do:

final_model = Model (inputs = custom_vgg_model.inputs, outputs = custom_vgg_model.get_layer ('fc2'). output)

I did what you said, but there was a new problem.

ValueError: X (images tensor) and y (labels) should have the same length. Found: X.shape = (6300, 224, 224, 3), y.shape = (2, 6300, 45)

In addition, the features of the fc2 layer output and y_train do not match. Y_train is the labels, and the center-loss target is near the center point, which has been updated in the losses.py center point.


# losses.py
def get_center_loss(labels,features, alpha,lambda_c,num_classes):
len_features = features.get_shape()[1]
try:
with tf.variable_scope('v_center',reuse = True):
centers = tf.get_variable('centers', [num_classes, len_features], dtype=tf.float32,
initializer=tf.constant_initializer(0), trainable=False)
except:
with tf.variable_scope('v_center',reuse = False):
centers = tf.get_variable('centers', [num_classes, len_features], dtype=tf.float32,
initializer=tf.constant_initializer(0), trainable=False)
labels = tf.argmax(labels, axis=1)
labels = tf.to_int64(labels)
center_loss = tf.reduce_mean(tf.square(features - centers_batch))
diffs = (features[:, tf.newaxis] - centers_batch[tf.newaxis, :])
# update
centers_update_op = tf.scatter_sub(centers, labels, diff)  # diff is used to get updated centers.
with tf.control_dependencies([centers_update_op]):
# combo_loss = value_factor * center_loss + new_factor * git_loss
combo_loss = lambda_c * center_loss
return combo_loss

def total_loss(y_true,y_pred): center_loss = get_center_loss(y_true,y_pred,alpha=0.5,lambda_c=0.01,num_classes=45) return center_loss

adriaciurana commented 5 years ago

Test to send the "Y" in the adjustment like this: y = {'fc2': y_train, 'predictions': y_train} and in validation equal but with the y_test.

Is it possible that you are doing final_model.fit (...)? That would not be correct you should learn custom_vgg_model.fit (...) and then once you have done the learning you can convert it so that it has a single output.

Can you provide the code that you currently have?

Try using the following center loss to make sure the error is in other site.

# Center Loss
def _center_loss_func(features, labels, alpha, num_classes, centers, feature_dim):
    assert feature_dim == features.get_shape()[1]    
    labels = K.argmax(labels, axis=1)
    labels = tf.to_int32(labels)
    centers_batch = K.gather(centers, labels)
    diff = (1 - alpha) * (centers_batch - features)
    centers = tf.scatter_sub(centers, labels, diff)
    centers_batch = K.gather(centers, labels)
    loss = K.mean(K.square(features - centers_batch))
    return loss

def get_center_loss(alpha, num_classes, feature_dim):
    """Center loss based on the paper "A Discriminative 
       Feature Learning Approach for Deep Face Recognition"
       (http://ydwen.github.io/papers/WenECCV16.pdf)
    """    
    # Each output layer use one independed center: scope/centers
    centers = K.zeros([num_classes, feature_dim], dtype='float32')
    @functools.wraps(_center_loss_func)
    def center_loss(y_true, y_pred):
        return _center_loss_func(y_pred, y_true, alpha, num_classes, centers, feature_dim)
    return center_loss
wangjue-wzq commented 5 years ago

Test to send the "Y" in the adjustment like this: y = {'fc2': y_train, 'predictions': y_train} and in validation equal but with the y_test.

Is it possible that you are doing final_model.fit (...)? That would not be correct you should learn custom_vgg_model.fit (...) and then once you have done the learning you can convert it so that it has a single output.

Can you provide the code that you currently have?

Try using the following center loss to make sure the error is in other site.

# Center Loss
def _center_loss_func(features, labels, alpha, num_classes, centers, feature_dim):
    assert feature_dim == features.get_shape()[1]    
    labels = K.argmax(labels, axis=1)
    labels = tf.to_int32(labels)
    centers_batch = K.gather(centers, labels)
    diff = (1 - alpha) * (centers_batch - features)
    centers = tf.scatter_sub(centers, labels, diff)
    centers_batch = K.gather(centers, labels)
    loss = K.mean(K.square(features - centers_batch))
    return loss

def get_center_loss(alpha, num_classes, feature_dim):
    """Center loss based on the paper "A Discriminative 
       Feature Learning Approach for Deep Face Recognition"
       (http://ydwen.github.io/papers/WenECCV16.pdf)
    """    
    # Each output layer use one independed center: scope/centers
    centers = K.zeros([num_classes, feature_dim], dtype='float32')
    @functools.wraps(_center_loss_func)
    def center_loss(y_true, y_pred):
        return _center_loss_func(y_pred, y_true, alpha, num_classes, centers, feature_dim)
    return center_loss

All the code are here.There are two questions 1、in custom_vgg_model.fit(y = {'fc2':y_train,'predictions':y_train}),'fc2':y_train have error that

ValueError: Error when checking target: expected fc2 to have shape (None, 4096) but got array with shape (6300, 45)

y_train is the labels. If I do like this custom_vgg_model.fit(y = {'fc2':dummy1,'predictions':y_train}),the model will train successful. The dummy1 have same shape with 'fc2' output(feature). dummy1 = np.zeros((y_train.shape[0],4096)) But can't improve the accuracy of the model.So it is wrong coding. 2、It is wrong to use ImageDataGenerator.flow(x = X_train, y = {'fc2':dummy1,'predictions':y_train}, batch_size=batch_Sizes).So I can't expand my data.

code

image_input = Input(shape=(224, 224, 3))
model = VGG16(input_tensor=image_input, include_top=True,weights='imagenet')
model.summary()
last_layer = model.get_layer('fc2').output
feature = last_layer
out = Dense(num_classes,activation = 'softmax',name='predictions')(last_layer)
custom_vgg_model = Model(inputs = image_input, outputs = [out,feature])
custom_vgg_model.summary()
for layer in custom_vgg_model.layers[:-3]:
    layer.trainable = False
custom_vgg_model.layers[3].trainable    
sgd = optimizers.SGD(lr=learn_Rate,decay=decay_Rate,momentum=0.9,nesterov=True)
center_loss = lossclass.get_center_loss(alpha=0.5, num_classes=45,feature_dim = 4096)
custom_vgg_model.compile(loss={'predictions': "categorical_crossentropy", 'fc2': center_loss},
                         loss_weights={'fc2': 1, 'predictions': 1},optimizer= sgd,
                                      metrics={'predictions': 'accuracy'})
t=time.time()
dummy1 = np.zeros((y_train.shape[0],4096))
dummy2 = np.zeros((y_test.shape[0],4096))
if not data_Augmentation:
    hist = custom_vgg_model.fit(x = X_train,y = {'fc2':y_train,'predictions':y_train},batch_size=batch_Sizes,
                                epochs=epoch_Times, verbose=1,validation_data=(X_test, {'fc2':y_test,'predictions':y_test}))
else:
    datagen = ImageDataGenerator(
            featurewise_center=False,
            samplewise_center=False,
            featurewise_std_normalization=False,
            samplewise_std_normalization=False,
            zca_whitening=False,
            rotation_range=20,
            width_shift_range=0.2,
            height_shift_range=0.2,
            horizontal_flip=True,
            vertical_flip=True,
            rescale=None,
            preprocessing_function=None,
            data_format=None)
    print('x_train.shape[0]:{:d}'.format(X_train.shape[0]))
    hist = custom_vgg_model.fit_generator(datagen.flow(x = X_train, y = {'fc2':dummy1,'predictions':y_train}, batch_size=batch_Sizes),
                                          steps_per_epoch=X_train.shape[0]/batch_Sizes,epochs=epoch_Times,
                                                                       verbose=1, validation_data=(X_test, {'fc2':y_test,'predictions':y_test}))
# lossclass.py
def _center_loss_func(labels,features, alpha, num_classes, centers, feature_dim):
    assert feature_dim == features.get_shape()[1]    
    labels = K.argmax(labels, axis=1)
    labels = tf.to_int32(labels)
    centers_batch = K.gather(centers, labels)
    diff = (1 - alpha) * (centers_batch - features)
    centers = tf.scatter_sub(centers, labels, diff)
    centers_batch = K.gather(centers, labels)
    loss = K.mean(K.square(features - centers_batch))
    return loss

def get_center_loss(alpha, num_classes, feature_dim):
    """Center loss based on the paper "A Discriminative 
       Feature Learning Approach for Deep Face Recognition"
       (http://ydwen.github.io/papers/WenECCV16.pdf)
    """    
    # Each output layer use one independed center: scope/centers
    centers = K.zeros([num_classes, feature_dim], dtype='float32')
    @functools.wraps(_center_loss_func)
    def center_loss(y_true, y_pred):
        return _center_loss_func(y_true, y_pred, alpha, num_classes, centers, feature_dim)
    return center_loss
Alireza-Akhavan commented 5 years ago

image

my center loss is getting Nan, how should I fix it?