shamangary / Keras-MNIST-center-loss-with-visualization

An implementation for mnist center loss training and visualization
75 stars 18 forks source link

验证集的准确率不提高? #3

Closed alyato closed 6 years ago

alyato commented 6 years ago

hi @shamangary 我用我自己的数据集,训练。在测试集上的结果很差,只有14%,而且发现dense_2_acc 不提高,最高到19%。如果,我不用center_loss,在测试集上的准确率有50%。 2018-05-27 6 14 57 加载数据集

训练集和验证集 train,trainlabel = xrp.loadtrain() train = densenet.preprocess_input(train) index = [i for i in range(len(train))] random.shuffle(index) train = train[index] trainlabel = trainlabel[index] (x_train,x_test)=(train[0:11000],train[11000:]) (y_train,y_test)=(trainlabel[0:11000],trainlabel[11000:]) y_train_value = y_train y_test_value = y_test y_train = np_utils.to_categorical(y_train,nb_classes) y_test = np_utils.to_categorical(y_test,nb_classes) 测试集 X_test,Y_test = xrp.loadtest() Z_test_value = Y_test Y_test = np_utils.to_categorical(Y_test,nb_classes) X_test = densenet.preprocess_input(X_test)

网络结构如下 base_model = VGG16(weights='imagenet',include_top=False) x = base_model.output x = GlobalAveragePooling2D()(x) x = BatchNormalization(axis=1)(x) ip1 = Dense(1024,activation='relu')(x) predictions = Dense(8,activation='softmax')(ip1) model = Model(input=base_model.input,output=predictions)

 for layer in base_model.layers:
      layer.trainable = True

 sgd = SGD(lr=0.0001, decay=1e-6, momentum=0.9,nesterov=True)
 model.compile(optimizer='rmsprop',loss='categorical_crossentropy')

this is the center_loss

isCenterloss = True

  if isCenterloss:
      lambda_c = 0.2
      input_target = Input(shape=(1,))
      centers = Embedding(8,1024)(input_target)
      l2_loss = Lambda(lambda x: K.sum(K.square(x[0]- 
      x[1[:,0]),1,keepdims=True),name='l2_loss')
      model_centerloss = Model(inputs=[base_model.input,input_target],outputs=[predictions,l2_loss])
     sgd_1 = SGD(lr=0.00001, decay=1e-6, momentum=0.9,nesterov=True)
      model_centerloss.compile(optimizer=sgd_1, loss=["categorical_crossentropy", lambda y_true,,y_pred: y_pred],loss_weights=[1,lambda_c],metrics=['accuracy'])

  if isCenterloss:
     random_y_train = np.random.rand(x_train.shape[0],1)
     random_y_test = np.random.rand(x_test.shape[0],1)
     model_centerloss.fit([x_train,y_train_value], [y_train, random_y_train], batch_size=batch_size,epochs=nb_epoch, verbose=1, validation_data=([x_test,y_test_value], [y_test,random_y_test]))

预测代码如下:

privateLabel_0 = model_centerloss.predict([X_test,Z_test_value],batch_size=1,verbose=1) print (len(privateLabel_0) privateLabel = privateLabel_0[0] list_label1 = [] list_label2 = [] x= len(privateLabel)

for j in range(x):
     list_label1.append(np.argmax(Y_test[j]))
     list_label2.append(np.argmax(privateLabel[j]))

privateAcc = len([1 for i in range(len(Y_test)) if list_label2[i]==list_label1[i]])/float(len(Y_test)) print ('the privateAcc is ',privateAcc)

我调整了learning_rate = 0.00001,可是效果不好,dense_2_acc 不提高。 请问一下,是哪方面的问题。谢谢。

shamangary commented 6 years ago

呃,任何loss都沒有保證一定會在任意數據集改善啊,特別是我不知道你的數據集是否適用你所用的1024 dim, 8 centers。又或是你的數據集不適用sgd等等,太多變數了。我這只是想顯示數據有聚類的情形而已。

如果不知為何center loss無效,我目前只想到你先用t-SNE畫出你的1024 dim特徵是否經過center loss後有聚類的情形。如果沒有很可能是因為你的數據很特殊無法用這個loss來model。

alyato commented 6 years ago

谢谢。关于8,和1024,的含义是: 我的数据集全部是人脸,每张人脸对应一种疾病。共有8种疾病。 因为是分类,有8种疾病。网络结构中的Dense层设置了1024 dim。

shamangary commented 6 years ago

我的猜測有幾個,首先原本center loss是用來作face verification的,也就是說類似VGG這種network到最後的feature很可能是臉部整個的表現模式,而疾病這種class可能不會影響臉部資訊太多。可能是你的center loss把不同臉部的聚類(疾病的class對應到不同人臉,即使有center loss的target可能也會對臉部聚類),而不是把所謂的疾病聚類,因為疾病的特徵可能比較接近texture,或許不是那麼適合。

liuchuanloong commented 6 years ago

image 我的训练精度与验证精度都不错,但是在测试集上的精度只有60%,用训练的图片测试只有56%,看不出问题在哪 训练网络

def Net(image,labels,classes):
    ## prepare nets ##

    base_model = VGG16(input_tensor=image, include_top=False, weights='imagenet')
    for layer in base_model.layers[:15]:
        layer.trainable = False
    x = base_model.output
    x = Convolution2D(1024, (1, 1), padding='same', name='add_conv', kernel_initializer='TruncatedNormal')(x)
    x = BatchNormalization(name='add_normalization')(x)
    x = Activation('relu', name='add_activation')(x)
    x = GlobalAveragePooling2D()(x)
    x = Dense(1024, activation='relu', name='add_fc1', kernel_initializer='TruncatedNormal')(x)
    x = Dropout(rate=0.5)(x)
    x = Dense(1024, activation='relu', name='add_fc2', kernel_initializer='TruncatedNormal')(x)
    x = Dropout(rate=0.5)(x)
    logits = Dense(classes, activation='softmax', name='predictions')(x)

    feature = Dense(256, name='add_centerloss_fc', kernel_initializer='TruncatedNormal')(x)
    feature = PReLU(name='PRelu')(feature)

    centers = Embedding(classes, 256)(labels)
    l2_loss = Lambda(lambda x: K.sum(K.square(x[0] - x[1][:, 0]), 1, keepdims=True), name='l2_loss')([feature, centers])
    return logits, l2_loss

def build_network(images,labels,classes):
    # build all networks \ load weights \plot model #

    logits, l2_loss = Net(images,labels,classes)
    model = Model(inputs=[images, labels], outputs=[logits, l2_loss])

    model.summary()
    try:
        model.load_weights('./simple_center_tmp/.h5',by_name=True)
        print('Success load initial weighs!')
    except:
        print('Could not load model weights!')
    plot_model(model, to_file='./simple_center_log/centerloss_model.png',show_shapes=True)
    return model
images = Input(shape=(224,224,3))
labels = Input(shape=(586,))

dummy = np.zeros((C.batch_size, 1))
dummy_test = np.zeros((C.batch_size, 1))

model = build_network(images=images,labels=labels,classes=C.classes)

model.compile(optimizer=Adam(), loss=["categorical_crossentropy", lambda y_true, y_pred: y_pred],
                         loss_weights=[1, C.lambda_centerloss], metrics=['accuracy'])
for e in range(C.n_epochs):
    print('Training... Epoch', e)

    for n in range(int(train_temp.shape[0] / C.batch_size)):
        X_batch, Y_batch = imagehelper.generate_one_from_list(temp=train_temp, means=C.means)
        Y_batch_onehot = np_utils.to_categorical(Y_batch, C.classes)
        generated_images.fit(X_batch)
        gen = generated_images.flow(X_batch, Y_batch, batch_size=C.batch_size,shuffle=False)
        X_batch, Y_batch = next(gen)
        centerloss = model.train_on_batch([X_batch, Y_batch_onehot],[Y_batch_onehot,dummy],class_weight=train_weight)
        C.loss.append(centerloss)
        print('Epoch:{} batch:{}/{} total_loss:{} softmax_loss:{} center_loss:{} softmax_acc:{} center_acc:{}'
              .format(e, n, int(train_temp.shape[0] / C.batch_size), centerloss[0],centerloss[1],centerloss[2],centerloss[3],centerloss[4]))
        write_log(callback, train_names, centerloss, len(C.loss))
    if e % 2 == 0 and e != 0:
        model.save_weights('./simple_center_tmp/lambda{} centerloss epoch {} {}.h5'.format(C.lambda_centerloss, e, time.ctime()))
    # validation nodel #
    if C.train_and_val:
        val_softmaxlosslist = []
        val_centerlosslist = []
        val_totallosslist = []
        val_softmaxacclist = []
        val_centeracclist = []
        for n in range(int(val_temp.shape[0] / C.batch_size)):
            X_val_batch, Y_val_batch = imagehelper.generate_one_from_list(val_temp, means=C.means)
            generated_images.fit(X_val_batch)
            gen = generated_images.flow(X_val_batch, Y_val_batch, batch_size=C.batch_size, shuffle=False)
            X_val_batch, Y_val_batch = next(gen)
            Y_val_batch = np_utils.to_categorical(Y_val_batch, C.classes)
            val_loss1 = model.test_on_batch([X_val_batch, Y_val_batch], [Y_val_batch, dummy_test])
            print('Epoch:{} batch:{}/{} val_totalloss:{} val_softmaxloss:{} val_centerloss:{} val_softmaxacc:{} val_centeracc:{}'.format(e, n, int(val_temp.shape[0]/C.batch_size), val_loss1[0], val_loss1[1], val_loss1[2], val_loss1[3], val_loss1[4]))
            val_softmaxlosslist.append(val_loss1[1])
            val_centerlosslist.append(val_loss1[2])
            val_totallosslist.append(val_loss1[0])
            val_softmaxacclist.append(val_loss1[3])
            val_centeracclist.append(val_loss1[4])
        val_centerloss = np.mean(np.array(val_centerlosslist))
        val_softmaxloss = np.mean(np.array(val_softmaxlosslist))
        val_totalloss = np.mean(np.array(val_totallosslist))
        val_softmaxacc = np.mean(np.array(val_softmaxacclist))
        val_centeracc = np.mean(np.array(val_centeracclist))
        val = np.array([val_totalloss, val_softmaxloss, val_centerloss, val_softmaxacc, val_centeracc])
        print('Epoch:{} val_softmaxloss:{} val_centerloss:{} val_totalloss:{} val_softmaxacc:{} val_centeracc:{}'.format(e, val_softmaxloss, val_centerloss, val_totalloss, val_softmaxacc, val_centeracc))
        write_log(callback, val_names, val, e + 1)
model.save_weights('./simple_center_tmp/cemterloss {}.h5'.format(time.ctime()))

测试网络

############define a classification networks##############
def Net(image,labels,classes):
    ## prepare nets ##

    base_model = VGG16(input_tensor=image, include_top=False)
    x = base_model.output
    x = Convolution2D(1024, (1, 1), padding='same', name='add_conv', kernel_initializer='TruncatedNormal')(x)
    x = BatchNormalization(name='add_normalization')(x)
    x = Activation('relu', name='add_activation')(x)
    x = GlobalAveragePooling2D()(x)
    x = Dense(1024, activation='relu', name='add_fc1', kernel_initializer='TruncatedNormal')(x)
    x = Dropout(rate=0.5)(x)
    x = Dense(1024, activation='relu', name='add_fc2', kernel_initializer='TruncatedNormal')(x)
    x = Dropout(rate=0.5)(x)
    logits = Dense(classes, activation='softmax', name='predictions')(x)

    feature = Dense(256, name='add_centerloss_fc', kernel_initializer='TruncatedNormal')(x)
    feature = PReLU(name='PRelu')(feature)

    centers = Embedding(classes, 256)(labels)
    l2_loss = Lambda(lambda x: K.sum(K.square(x[0] - x[1][:, 0]), 1, keepdims=True), name='l2_loss')([feature, centers])
    return logits, l2_loss

def build_network(images,labels,classes):
    # build all networks \ load weights \plot model #

    logits, l2_loss = Net(images,labels,classes)
    model = Model(inputs=[images,labels], outputs=[logits, l2_loss])
    model.summary()
    # test model #
    #test_model = Model(inputs=images,outputs=logits)
    #test_model.summary()
    #predict_model = Model(inputs=images, outputs=test_model.get_layer('add_dropout2').output)
    try:
        model.load_weights('./simple_center_tmp/lambda0.1 centerloss epoch 50 Mon May 28 08:53:55 2018.h5')
        print('Success load initial weighs!')
    except:
        print('Could not load model weights!')
    return model
################## define networks end #####################
images = Input(shape=(224,224,3))
labels = Input(shape=(586,))

model = build_network(images=images,labels=labels,classes=C.classes)
model.compile(optimizer=Adam(),
              loss=['categorical_crossentropy',lambda y_true, y_pred: y_pred],
              metrics=['accuracy'])

# validation nodel #
if C.test:
    val_softmaxlosslist = []
    val_softmaxacclist = []
    dummy_test = np.zeros((C.batch_size, 1))

    for n in range(int(val_temp.shape[0] / C.batch_size)):
        X_val_batch, Y_val_batch = Gen.generate_one_from_list(val_temp, means=C.valmeans)
        Y_val_batch = np_utils.to_categorical(Y_val_batch, C.classes)
        #val_loss1 = test_model.test_on_batch(X_val_batch, Y_val_batch)
        val_loss1 = model.test_on_batch([X_val_batch, Y_val_batch], [Y_val_batch, dummy_test])
        print(val_loss1)
        #pre = model.predict_on_batch(X_val_batch)
        print('batch:{}/{} val_softmaxloss:{} val_softmaxacc:{}'.format(n, int(val_temp.shape[0]/C.batch_size), val_loss1[1], val_loss1[3]))
        val_softmaxlosslist.append(val_loss1[1])
        val_softmaxacclist.append(val_loss1[3])
    val_softmaxloss = np.mean(np.array(val_softmaxlosslist))
    val_softmaxacc = np.mean(np.array(val_softmaxacclist))
    print('TEST: val_softmaxloss:{} val_softmaxacc:{}'.format(val_softmaxloss, val_softmaxacc))
alyato commented 6 years ago

@liuchuanloong 谢谢。现在我正在查找问题,同时shamangary也给了建议。 还有谢谢你分享你的代码。 你的代码和我的不同,我想通过你的网络结构,来运行自己的数据集,看看是不是有所提高。 看了你的代码,有些困惑。 labels = Input(shape=(586,))这个为啥是586?

liuchuanloong commented 6 years ago

标签共有586类,所以是586行1列的张量 batch_size 不用写 我的测试代码不知道是不是有问题,测试的结果和训练验证的结果相差很大,如果你解决了可不可以分享一下

alyato commented 6 years ago

@liuchuanloong 我觉得和TYY_mnist.py 中的代码不一样。

input_target = Input(shape=(1,)) # single value ground truth labels as inputs centers = Embedding(10,2)(input_target)

而你的是labels = Input(shape=(586,)),我觉得应该是1. 不知道理解的对不对。 @shamangary 谢谢。

shamangary commented 6 years ago

由於Embedding吃的是單一值的label而不是one-hot的label所以@alyato講的是對的喔

https://github.com/keras-team/keras/issues/4981 這裡有討論如何把one-hot變成單一值,使用np.argmax or K.argmax皆可

liuchuanloong commented 6 years ago

改了你指出的错误,但是测试的结果依然只有60%,而训练验证的结果都达到了80%,用训练集直接测试我的结果还是60%,怀疑是我的测试网络有问题,但是看不出问题在哪,还请赐教

shamangary commented 6 years ago

抱歉我也看不出來,理論上來說如果連validation都不錯應該testing不會太差才對,我建議你檢查你的dataset在輸入前是否有什麼問題,建議用降維的t-SNE畫出你的testing feature和training的聚類結果是否有重疊到。否則這樣只看code看不出什麼東西來。

liuchuanloong commented 6 years ago

好的,感谢你的提醒,找到问题所在了

alyato commented 6 years ago

@liuchuanloong 不知道你是怎么解决的。 我看了你的代码,发现和我的不一样,不知道是不是这样才导致我的结果不好。 我提的问题中已经列出代码了,我现在在复制到下面来。 希望你看一下有什么不同。 我的codes: 网络结构如下 base_model = VGG16(weights='imagenet',include_top=False) x = base_model.output x = GlobalAveragePooling2D()(x) x = BatchNormalization(axis=1)(x) ip1 = Dense(1024,activation='relu')(x) predictions = Dense(8,activation='softmax')(ip1) model = Model(input=base_model.input,output=predictions)

for layer in base_model.layers:      layer.trainable = True

sgd = SGD(lr=0.0001, decay=1e-6, momentum=0.9,nesterov=True) model.compile(optimizer='rmsprop',loss='categorical_crossentropy')

定义网络结构的时候,调用了model.compile函数

然后我定义center_loss函数,如下:

this is the center_loss

isCenterloss = True

if isCenterloss:      lambda_c = 0.2     input_target = Input(shape=(1,))     centers = Embedding(8,1024)(input_target)     l2_loss = Lambda(lambda x: K.sum(K.square(x[0]-   x[1[:,0]),1,keepdims=True),name='l2_loss')     model_centerloss = Model(inputs=[base_model.input,input_target],outputs=[predictions,l2_loss])     sgd_1 = SGD(lr=0.00001, decay=1e-6, momentum=0.9,nesterov=True)     model_centerloss.compile(optimizer=sgd_1, loss=["categorical_crossentropy", lambda y_true,,y_pred: y_pred],loss_weights=[1,lambda_c],metrics=['accuracy'])

在这个if 下,我又定义了model_centerloss.compile

之后,是训练过程 if isCenterloss:     random_y_train = np.random.rand(x_train.shape[0],1)     random_y_test = np.random.rand(x_test.shape[0],1)     model_centerloss.fit([x_train,y_train_value], [y_train, random_y_train], batch_size=batch_size,epochs=nb_epoch, verbose=1, validation_data=([x_test,y_test_value], [y_test,random_y_test]))

我是参考TYY_mnist.py的格式写的 @shamangary @liuchuanloong 请问一下,因为有两次的model.compile,和model_centerloss.compile,会不会有影响。毕竟在model_centerloss.compile 前定义了SGD,设置了learning rate 谢谢。

Luankaixiang commented 4 years ago

好的,感谢你的提醒,找到问题所在了

请问能分享下怎么解决的吗,我现在也遇到这个问题.