Closed xieximeng2008 closed 5 years ago
Don't use softmax. Use sigmoid units in the output layer and then use "binary_crossentrpy" loss.
That works in my case. However model.predict_classes
is not "adapted" for this. As an example for a sample from the test set, where target label is 1 0 1 0 0 0 0
(I have 7 in total, )
model.predict(tSets[1,:])
: 9.90e-01, 2.7e-07, 6.05e-13, 9.98e-01, 2.16e-05, 7.62e-05, 1.51e-04 (so that is correct), but
model.predict_classes(tSets[1,:])
gives just array([3]) (seems like it picks the highest value from model.predict
. A quick fix might be numpy.around
but maybe there is a more elegant solution?
Getting classes from .predict() is one line of numpy code really.
model.predict(blabla) > 0.5 ?
@elanmart Hi, why do you think using softmax is not a good idea?
Do you use a graph model, given we have multiple outputs?
my loss is not convergence @holderm @elanmart
model.predict(Y_train[1,:])
it shows [ 0.00000000e+000 0.00000000e+000 0.00000000e+000 0.00000000e+000 0.00000000e+000] my complete code:
from __future__ import absolute_import
from __future__ import print_function
import scipy.io
from keras.preprocessing.image import ImageDataGenerator
from keras.models import Sequential
from keras.layers.core import Dense, Dropout, Activation, Flatten
from keras.layers.convolutional import Convolution2D, MaxPooling2D
from keras.optimizers import SGD, Adadelta, Adagrad
from keras.utils import np_utils, generic_utils
from six.moves import range
batch_size = 100
nb_classes = 5
nb_epoch = 5
data_augmentation = True
shapex, shapey = 64, 64
nb_filters = [32, 64]
nb_pool = [4, 3]
nb_conv = [5, 4]
image_dimensions = 3
mat = scipy.io.loadmat('E:\scene.mat')
X_train = mat['x_train']
Y_train = mat['y_train']
X_test = mat['x_test']
Y_test = mat['y_test']
print(X_train.shape)
print(X_test.shape)
model = Sequential()
model.add(Convolution2D(nb_filters[0], image_dimensions, nb_conv[0], nb_conv[0], border_mode='valid'))
model.add(Activation('relu'))
model.add(MaxPooling2D(poolsize=(nb_pool[0], nb_pool[0])))
model.add(Dropout(0.25))
model.add(Convolution2D(nb_filters[1], nb_filters[0], nb_conv[1], nb_conv[1], border_mode='valid'))
model.add(Activation('relu'))
model.add(MaxPooling2D(poolsize=(nb_pool[1], nb_pool[1])))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(nb_filters[-1] * (((shapex - nb_conv[0]+1)/ nb_pool[0] -nb_conv[1]+1)/ nb_pool[1]) * (((shapey -nb_conv[0]+1)/ nb_pool[0] -nb_conv[1]+1)/ nb_pool[1]), 512))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(512, nb_classes,init='uniform'))
model.add(Activation('sigmoid'))
sgd = SGD(lr=0.01, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(loss='binary_crossentropy', optimizer=sgd)
if not data_augmentation:
print("Not using data augmentation or normalization")
X_train = X_train.astype("float32")
X_test = X_test.astype("float32")
X_train /= 255
X_test /= 255
model.fit(X_train, Y_train, batch_size=batch_size, nb_epoch=nb_epoch)
score = model.evaluate(X_test, Y_test, batch_size=batch_size)
print('Test score:', score)
else:
print("Using real time data augmentation")
# this will do preprocessing and realtime data augmentation
datagen = ImageDataGenerator(
featurewise_center=True, # set input mean to 0 over the dataset
samplewise_center=False, # set each sample mean to 0
featurewise_std_normalization=True, # divide inputs by std of the dataset
samplewise_std_normalization=False, # divide each input by its std
zca_whitening=False, # apply ZCA whitening
rotation_range=20, # randomly rotate images in the range (degrees, 0 to 180)
width_shift_range=0.2, # randomly shift images horizontally (fraction of total width)
height_shift_range=0.2, # randomly shift images vertically (fraction of total height)
horizontal_flip=True, # randomly flip images
vertical_flip=False) # randomly flip images
datagen.fit(X_train)
model.fit(X_train, Y_train, batch_size=batch_size, nb_epoch=nb_epoch)
score = model.evaluate(X_test, Y_test, batch_size=batch_size)
print (model.predict(X_test[1,:]))
could you help me to find out where it is wrong, thx !
@lemuriandezapada yeah,
labels = np.zeros(preds.shape)
labels[preds>0.5] = 1
@arushi02 in softmax when increasing score for one label, all others are lowered (it's a probability distribution). You don't want that when you have multiple labels.
No, you don't need Graph
Here's an example of one of my multilabel nets:
# Build a classifier optimized for maximizing f1_score (uses class_weights)
clf = Sequential()
clf.add(Dropout(0.3))
clf.add(Dense(xt.shape[1], 1600, activation='relu'))
clf.add(Dropout(0.6))
clf.add(Dense(1600, 1200, activation='relu'))
clf.add(Dropout(0.6))
clf.add(Dense(1200, 800, activation='relu'))
clf.add(Dropout(0.6))
clf.add(Dense(800, yt.shape[1], activation='sigmoid'))
clf.compile(optimizer=Adam(), loss='binary_crossentropy')
clf.fit(xt, yt, batch_size=64, nb_epoch=300, validation_data=(xs, ys), class_weight=W, verbose=0)
preds = clf.predict(xs)
preds[preds>=0.5] = 1
preds[preds<0.5] = 0
print f1_score(ys, preds, average='macro')
@xieximeng2008 What does it print during training?
@elanmart Using real time data augmentation
Epoch 0
100/1800 [>.............................] - ETA: 58s - loss: 8.1209
200/1800 [==>...........................] - ETA: 55s - loss: 6.7125
300/1800 [====>.........................] - ETA: 51s - loss: 6.2430
400/1800 [=====>........................] - ETA: 48s - loss: 6.0284
500/1800 [=======>......................] - ETA: 44s - loss: 6.1214
600/1800 [=========>....................] - ETA: 40s - loss: 5.9915
700/1800 [==========>...................] - ETA: 37s - loss: 5.8876
800/1800 [============>.................] - ETA: 33s - loss: 5.7681
900/1800 [==============>...............] - ETA: 30s - loss: 5.6844
1000/1800 [===============>..............] - ETA: 27s - loss: 5.6092
1100/1800 [=================>............] - ETA: 23s - loss: 5.5703
1200/1800 [===================>..........] - ETA: 20s - loss: 5.5240
1300/1800 [====================>.........] - ETA: 16s - loss: 5.4976
1400/1800 [======================>.......] - ETA: 13s - loss: 5.4809
1500/1800 [========================>.....] - ETA: 10s - loss: 5.4526
1600/1800 [=========================>....] - ETA: 6s - loss: 5.4486
1700/1800 [===========================>..] - ETA: 3s - loss: 5.4596
1800/1800 [==============================] - 60s - loss: 5.4326
Epoch 1
100/1800 [>.............................] - ETA: 56s - loss: 5.1808
200/1800 [==>...........................] - ETA: 52s - loss: 5.0979
300/1800 [====>.........................] - ETA: 49s - loss: 5.1670
400/1800 [=====>........................] - ETA: 45s - loss: 5.2326
500/1800 [=======>......................] - ETA: 42s - loss: 5.2554
600/1800 [=========>....................] - ETA: 39s - loss: 5.2430
700/1800 [==========>...................] - ETA: 36s - loss: 5.2104
800/1800 [============>.................] - ETA: 33s - loss: 5.1912
900/1800 [==============>...............] - ETA: 29s - loss: 5.1716
1000/1800 [===============>..............] - ETA: 26s - loss: 5.1559
1100/1800 [=================>............] - ETA: 23s - loss: 5.1318
1200/1800 [===================>..........] - ETA: 19s - loss: 5.1532
1300/1800 [====================>.........] - ETA: 16s - loss: 5.1489
1400/1800 [======================>.......] - ETA: 13s - loss: 5.1512
1500/1800 [========================>.....] - ETA: 9s - loss: 5.1642
1600/1800 [=========================>....] - ETA: 6s - loss: 5.1549
1700/1800 [===========================>..] - ETA: 3s - loss: 5.1418
1800/1800 [==============================] - 59s - loss: 5.1325
Epoch 2
100/1800 [>.............................] - ETA: 56s - loss: 5.2637
200/1800 [==>...........................] - ETA: 52s - loss: 5.1394
300/1800 [====>.........................] - ETA: 49s - loss: 5.1117
400/1800 [=====>........................] - ETA: 46s - loss: 5.0150
500/1800 [=======>......................] - ETA: 42s - loss: 5.0150
600/1800 [=========>....................] - ETA: 39s - loss: 4.9874
700/1800 [==========>...................] - ETA: 36s - loss: 5.0387
800/1800 [============>.................] - ETA: 32s - loss: 5.0565
900/1800 [==============>...............] - ETA: 29s - loss: 5.0565
1000/1800 [===============>..............] - ETA: 26s - loss: 5.0813
1100/1800 [=================>............] - ETA: 23s - loss: 5.0942
1200/1800 [===================>..........] - ETA: 19s - loss: 5.0876
1300/1800 [====================>.........] - ETA: 16s - loss: 5.1234
1400/1800 [======================>.......] - ETA: 13s - loss: 5.1305
1500/1800 [========================>.....] - ETA: 9s - loss: 5.1256
1600/1800 [=========================>....] - ETA: 6s - loss: 5.1316
1700/1800 [===========================>..] - ETA: 3s - loss: 5.1296
1800/1800 [==============================] - 60s - loss: 5.1325
Epoch 3
100/1800 [>.............................] - ETA: 56s - loss: 4.7664
200/1800 [==>...........................] - ETA: 52s - loss: 5.0772
300/1800 [====>.........................] - ETA: 49s - loss: 5.1394
400/1800 [=====>........................] - ETA: 46s - loss: 5.1290
500/1800 [=======>......................] - ETA: 42s - loss: 5.1311
600/1800 [=========>....................] - ETA: 39s - loss: 5.1601
700/1800 [==========>...................] - ETA: 36s - loss: 5.1157
800/1800 [============>.................] - ETA: 33s - loss: 5.1497
900/1800 [==============>...............] - ETA: 29s - loss: 5.1716
1000/1800 [===============>..............] - ETA: 26s - loss: 5.1891
1100/1800 [=================>............] - ETA: 23s - loss: 5.1695
1200/1800 [===================>..........] - ETA: 19s - loss: 5.1705
1300/1800 [====================>.........] - ETA: 16s - loss: 5.1585
1400/1800 [======================>.......] - ETA: 13s - loss: 5.1660
1500/1800 [========================>.....] - ETA: 9s - loss: 5.1587
1600/1800 [=========================>....] - ETA: 6s - loss: 5.1394
1700/1800 [===========================>..] - ETA: 3s - loss: 5.1394
1800/1800 [==============================] - 59s - loss: 5.1325
Epoch 4
100/1800 [>.............................] - ETA: 55s - loss: 5.1394
200/1800 [==>...........................] - ETA: 52s - loss: 5.1394
300/1800 [====>.........................] - ETA: 49s - loss: 5.1117
400/1800 [=====>........................] - ETA: 45s - loss: 5.1601
500/1800 [=======>......................] - ETA: 42s - loss: 5.1477
600/1800 [=========>....................] - ETA: 39s - loss: 5.1808
700/1800 [==========>...................] - ETA: 36s - loss: 5.1334
800/1800 [============>.................] - ETA: 32s - loss: 5.1290
900/1800 [==============>...............] - ETA: 29s - loss: 5.1163
1000/1800 [===============>..............] - ETA: 26s - loss: 5.1311
1100/1800 [=================>............] - ETA: 23s - loss: 5.1431
1200/1800 [===================>..........] - ETA: 19s - loss: 5.1394
1300/1800 [====================>.........] - ETA: 16s - loss: 5.1298
1400/1800 [======================>.......] - ETA: 13s - loss: 5.1423
1500/1800 [========================>.....] - ETA: 9s - loss: 5.1338
1600/1800 [=========================>....] - ETA: 6s - loss: 5.1161
1700/1800 [===========================>..] - ETA: 3s - loss: 5.1174
1800/1800 [==============================] - 59s - loss: 5.1325
testing...
100/200 [==============>...............] - ETA: 1s
200/200 [==============================] - 2s
[[ 0.00000000e+000 0.00000000e+000 0.00000000e+000 0.00000000e+000
0.00000000e+000]
[ 0.00000000e+000 0.00000000e+000 0.00000000e+000 0.00000000e+000
0.00000000e+000]
[ 0.00000000e+000 0.00000000e+000 0.00000000e+000 0.00000000e+000
0.00000000e+000]
[ 0.00000000e+000 0.00000000e+000 0.00000000e+000 0.00000000e+000
0.00000000e+000]
[ 1.22857558e-291 0.00000000e+000 3.11779756e-297 0.00000000e+000
0.00000000e+000]
.........
.........
almost all outputs are zero or very very small float num
@elanmart I used your example ,but also have above problems. dataset : X_train (1800,3,64,64),
X_test(200,3,64,64) Y_train(1800,5),Y_test(200,5)
I just change the code as you listed
model.fit(X_train, Y_train, batch_size=batch_size, nb_epoch=nb_epoch,validation_data = (X_test,Y_test),verbose = 0)
preds = model.predict(X_test)
preds[preds>= 0.5] = 1
preds[preds<0.5] = 0
print (preds)
Thanks for helping me!
@xieximeng2008 I'd guess the problem is in your data, since the network worked well for me few days ago.
@elanmart
Suppose I want to identify a house no 5436 from an image and I assume every image will have max 4 digits, so one image will be tagged with 4 one hot vectors like
[(0000010000), (0000100000), (0001000000), (0000001000)] and I pass this as a 2D matrix then will it give me probabilities for each element? In this kind of tagging, I want every row to have one element which is most probable (following a probability distribution).
Does anyone know how to replace the default the validation score by the another scoring function printed at every epoch? The scoring function for validation set should be similar to the one implemented for test set. Many thanks.
clf.fit(xt, yt, batch_size=64, nb_epoch=300, validation_data=(xs, ys), class_weight=W, verbose=0) preds = clf.predict(xs) preds[preds>=0.5] = 1 preds[preds<0.5] = 0 print f1_score(ys, preds, average='macro')
@elanmart i have image dataset, each having multiple label and y for particular image is [1,1,-1,-1,-1] where 1==class present and -1==class not present. my question is how to change y so that keras model will accept that y for trainning the data.
@suraj-deshmukh ,Do you solve your problem how to load the multi-label data? How do you do it? Do you share your code? Thanks.
@alyato , Hi I solved my problem but I lost all my codes :( due to hdd failure. But as I said in previous comment my y/target was [1,1,-1,-1,-1] and I converted it into [1,1,0,0,0] where 1 == presence and 0 == absence for all images and passed that data to ConvNet having binary crossentropy as loss function and sigmoid as activation function for output layer.
@suraj-deshmukh ,Does i understand it like this. for single label:(total 3)
x
y
[1,2,3]
[0]
[4,5,6]
[1]
[7,8,9]
[2]
So i load the train_data and train_label. The format of train_label is [0,1,2].
train_label.shape is (3,)
But for multi-label:(total 3)
x
y
[1,2,3]
[0,2]
[4,5,6]
[1,2]
[7,8,9]
[0,1]
Then The format of train_label is [ [1,0,1],[0,1,1],[1,1,0] ]
train_label.shape is (3,3)
Is that right? If it are right,i also have one question.
for single label,The format of train_label is [0,1,2].And i need call the function (np_utils.to_categorical),converting it to the one-hot format
for multi-label ,The format of train_label is [ [1,0,1],[0,1,1],[1,1,0] ] I don't call the function (np_utils.to_categorical)
@alyato
yes you are right
@suraj-deshmukh ,Thanks for your answer. But i also have some questions.
preds[preds>=0.5] = 1 preds[preds<0.5] = 0
- how to set the Threshold,such as 0.5
- If i gets my predict_test_label,how can i compare it with the real_test_label.
the predict_test_label is [[1,0,1], [0,1,1], [1,1,0]] and the real_test_label is [[1,0,0], [1,0,1], [1,1,0]]
how to measure my model is better or worse?
@elanmart "in softmax when increasing score for one label, all others are lowered (it's a probability distribution). You don't want that when you have multiple labels."
I am kind of disagree with the conclusion. Maybe I am wrong. softmax is just to calculate a normalized exponential value (probability) for each node in the output layer. Assuming there are two target labels out of seven for example, the neural network tries to predict top two posterior probabilities in the specific nodes, and the two probs are definitely the same.
Hi, I'm trying to classify an image with multiple digits. Say an image with "123" to output "123". There are up to 5 digits.
I'm stuck after I built the convolution layers. How do we output 5 digits each with 10 classes? Some suggested 5 independent fully connected layers after the final convolution layer. But how do we code this in Keras for the 5 independent FCs?
@xieximeng2008 Did you ever find out why your network only returned values close to zero? I am in a similar situation where my network only returns zeroes. I am fine-tuning an InceptionV3 model. Loss function is binary_crossentropy, I am using sigmoid as activation for the final layer, and as an optimizer I use rmsprop.
@xieximeng2008 check this https://suraj-deshmukh.github.io/Multi-Label-Image-Classification/
like this! modify sgd to Adam, could dec loss! thank @elanmart @xieximeng2008 , i use this cnn same with you! cnn --- sigmoid binary_crossentropy adam, this is all!
This thread is really helpful! I have another question. What if my response data is partially missing, i.e. say I have five classes, and most of the data only have partial information on responses, e.g. [1,0,NaN,NaN,1]. I know I can build individual model for each class, but what if I want to build one single model?
@michelleowen I am in no way an expert, but could it maybe work to set the NaN values to 0.5? This might not work in general, and it might be that this value should be tweaked dependent on the problem.
@janmatias Yes, I agree it is one workaround, but not perfect. I am thinking to modify the loss function, if the true response is NaN, then don't penalize it in the loss function. However, I am not quite sure which part of the keras code I should modify.
Awesome! I still have a question. If the dataset is quite imbalanced, i.e. samples in some categories are much more than others, how can I adopt class_weight to solve this to get a multi-label prediction? Can anybody answer me? @suraj-deshmukh @xieximeng2008
@xieximeng2008 have you ever solve the problem?I have similar problem with you. I use sigmoid function as activiation function and my loss is binary cross entropy loss. As training, the loss did drop. But when feed an image into the network, the output probability is all zero. So weird,how could it happen?
@vanpersie32 If you have a lot of labels (say 1000) and only 2 of the labels are 1s, the model is happy to assigned 0 to all labels to get a very low binary cross entropy as this is an average across all labels and 998 of 0 will mask the signal from the 2 labels you want to classify. I found this very annoying.
@jerrypaytm In this case, you need to set sample weight for each sample. When you have 1000 labels, for a particular class, the data with other 999 labels are all negative samples. Then you have to punish hard when a positive sample is marked as negative
@james97 Thanks! I will try that. It should also speed up the convergence as the signal from the class labels are diluted by the 0 zeros.
I think most people with multi-label classification will face this issue. Unless there are half-one labels and half-zero labels in the target. Otherwise, the network will think it is doing great by just setting 0 for all labels.
@jerrypaytm You are welcome. The process will become a little bit complex here. Model will be compiled with sample_weight_mode='temporal', Y vector will be 3D instead of 2D because each it contains the results of multiple binary outputs instead of one softmax output. Anybody has an easier way?
@jerrypaytm I am not sure if you have a problem with skewed label distribution in the training data or with encoding labels as one hot vector and using binary cross entropy instead of categorical.
Would it maybe be an alternative to use a different loss function? Like this tensorflow loss function: https://www.tensorflow.org/api_docs/python/tf/nn/weighted_cross_entropy_with_logits
The keras binary_crossentropy loss uses the _sigmoid_cross_entropy_withlogits tensorflow function, and tensorflow _weighted_cross_entropy_withlogits is ...
like sigmoid_cross_entropy_with_logits() except that pos_weight, allows one to trade off recall and precision by up- or down-weighting the cost of a positive error relative to a negative error.
In the case where you just have a lot of labels and not very imbalanced training data maybe this could help?
I haven't tried to implement a custom loss function in keras yet though, so I don't know how much effort this would be and if it works well - but if it is not too complicated it might be worth trying?!
@djstrong The problem I'm trying to solve is a 2 out of 64 label classification. A very skewed dataset would force the network to learn the labels that are the majority in the training dataset but my observation is different. All sigmoids in the last layer are happy to produce a very low score. If you look at the math, it does make sense because 62/64 of labels in the target variable are 0. We need a way to penalize the 2 labels (in my case) that are 1 to have stronger signal so that the network takes them seriously.
@tobigue This is the direction I'm moving toward right now. gradient descenting (hopefully). Thanks!
@jerrypaytm You're welcome. I'd be interested if this worked for you and how one can use this tensorflow function in a keras model.
@jerrypaytm one more thing I remembered - Keras 1.x had an option to print precision, recall and fmeasure metrics during training. I found this very helpful when using binary_crossentropy with multiple labels, as all the correctly predicted zeros push the accuracy metric immediately to a very unhelpful high value. I guess it should be still possible using a custom metric function in Keras 2.
@tobigue I am looking for something like this as well. My accuracy goes to 90% after one epoch because I have so many 0's in my label set. Does anyone have a suggestion for this? Either a custom function or package update? I am using Keras 2.0.2.
@iymitchell You can try updating your class weights.
I would suggest to use tanh instead of sigmoid. Tanh distributes values in range (-1; 1), sigmoid distributes in range (0, 1). For optimization point of view it is better when threshold centered around zero, rather than around 0.5
To summarize:
binary_crossentropy
For predictions, you can use the pattern
preds = clf.predict(xs)
preds[preds>=0.5] = 1
preds[preds<0.5] = 0
Hi @elanmart , I read your explanation about why softmax is not good and it makes perfect sens. But then is there any use case softmax is better then sigmoid + binary_crossentropy? It seems most classification use cases is label mutual exclusive. So it seems softmax is not that useful in most of classification problems?
@elanmart hi, I am using below code try to detecting multi-labels on pascal voc data, but the validation loss is increasing from the first round. Following the orignal code, I changed the last layer and use sigmoid and binary_crossentropy, wondering why the training loss is decreasing but why the validation loss is increasing and the accurency is decreasing.
# -*- coding: utf-8 -*-
import keras
from keras.models import Sequential
from keras.optimizers import SGD
from keras.layers import Input, Dense, Convolution2D, MaxPooling2D, AveragePooling2D, ZeroPadding2D, Dropout, Flatten, merge, Reshape, Activation
from sklearn.metrics import log_loss
from load_cifar10 import load_cifar10_data
from load_pascal2012 import load_pascal2012_data
import matplotlib
matplotlib.use('Agg')
import matplotlib.pyplot as plt
import numpy as np
from keras import backend as K
K.set_image_dim_ordering('th')
import sklearn.metrics as skm
def vgg16_model(img_rows, img_cols, channel=1, num_classes=None):
"""VGG 16 Model for Keras
Model Schema is based on
https://gist.github.com/baraldilorenzo/07d7802847aaad0a35d3
ImageNet Pretrained Weights
https://drive.google.com/file/d/0Bz7KyqmuGsilT0J5dmRCM0ROVHc/view?usp=sharing
Parameters:
img_rows, img_cols - resolution of inputs
channel - 1 for grayscale, 3 for color
num_classes - number of categories for our classification task
"""
model = Sequential()
model.add(ZeroPadding2D((1, 1), input_shape=(channel, img_rows, img_cols)))
model.add(Convolution2D(64, 3, 3, activation='relu'))
model.add(ZeroPadding2D((1, 1)))
model.add(Convolution2D(64, 3, 3, activation='relu'))
model.add(MaxPooling2D((2, 2), strides=(2, 2)))
model.add(ZeroPadding2D((1, 1)))
model.add(Convolution2D(128, 3, 3, activation='relu'))
model.add(ZeroPadding2D((1, 1)))
model.add(Convolution2D(128, 3, 3, activation='relu'))
model.add(MaxPooling2D((2, 2), strides=(2, 2)))
model.add(ZeroPadding2D((1, 1)))
model.add(Convolution2D(256, 3, 3, activation='relu'))
model.add(ZeroPadding2D((1, 1)))
model.add(Convolution2D(256, 3, 3, activation='relu'))
model.add(ZeroPadding2D((1, 1)))
model.add(Convolution2D(256, 3, 3, activation='relu'))
model.add(MaxPooling2D((2, 2), strides=(2, 2)))
model.add(ZeroPadding2D((1, 1)))
model.add(Convolution2D(512, 3, 3, activation='relu'))
model.add(ZeroPadding2D((1, 1)))
model.add(Convolution2D(512, 3, 3, activation='relu'))
model.add(ZeroPadding2D((1, 1)))
model.add(Convolution2D(512, 3, 3, activation='relu'))
model.add(MaxPooling2D((2, 2), strides=(2, 2)))
model.add(ZeroPadding2D((1, 1)))
model.add(Convolution2D(512, 3, 3, activation='relu'))
model.add(ZeroPadding2D((1, 1)))
model.add(Convolution2D(512, 3, 3, activation='relu'))
model.add(ZeroPadding2D((1, 1)))
model.add(Convolution2D(512, 3, 3, activation='relu'))
model.add(MaxPooling2D((2, 2), strides=(2, 2)))
# Add Fully Connected Layer
model.add(Flatten())
model.add(Dense(4096, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(4096, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(1000, activation='softmax'))
# Loads ImageNet pre-trained data
model.load_weights('imagenet_models/vgg16_weights_th_dim_ordering_th_kernels.h5')
# Truncate and replace softmax layer for transfer learning
model.layers.pop()
model.outputs = [model.layers[-1].output]
model.layers[-1].outbound_nodes = []
model.add(Dense(num_classes, activation='sigmoid'))
# Uncomment below to set the first 10 layers to non-trainable (weights will not be updated)
#for layer in model.layers[:10]:
# layer.trainable = False
# Learning rate is changed to 0.001
sgd = SGD(lr=1e-3, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(optimizer=sgd,
loss='binary_crossentropy',
metrics=['accuracy'])
return model
if __name__ == '__main__':
# Example to fine-tune on 3000 samples from Cifar10
img_rows, img_cols = 224, 224 # Resolution of inputs
channel = 3
num_classes = 20
batch_size = 16
nb_epoch = 10
# Load Cifar10 data. Please implement your own load_data() module for your own dataset
# X_train, Y_train, X_valid, Y_valid = load_cifar10_data(img_rows, img_cols)
X_train, Y_train, X_valid, Y_valid = load_pascal2012_data(img_rows, img_cols)
# Load our model
model = vgg16_model(img_rows, img_cols, channel, num_classes)
# Start Fine-tuning
history = model.fit(X_train, Y_train,
batch_size=batch_size,
epochs=nb_epoch,
shuffle=True,
verbose=1,
validation_data=(X_valid, Y_valid),
)
# Make predictions
predictions_valid = model.predict(X_valid, batch_size=batch_size, verbose=1)
# Cross-entropy loss score
score = log_loss(Y_valid, predictions_valid)
print(score)
Epoch 1/10
3000/3000 [==============================] - 255s - loss: 0.2229 - acc: 0.9338 - val_loss: 0.3628 - val_acc: 0.9150
Epoch 2/10
3000/3000 [==============================] - 256s - loss: 0.1510 - acc: 0.9487 - val_loss: 0.4318 - val_acc: 0.9025
Epoch 3/10
3000/3000 [==============================] - 256s - loss: 0.1230 - acc: 0.9556 - val_loss: 0.4887 - val_acc: 0.8980
Epoch 4/10
3000/3000 [==============================] - 257s - loss: 0.1064 - acc: 0.9608 - val_loss: 0.5058 - val_acc: 0.8985
Epoch 5/10
3000/3000 [==============================] - 257s - loss: 0.0946 - acc: 0.9639 - val_loss: 0.5580 - val_acc: 0.8940
Epoch 6/10
3000/3000 [==============================] - 257s - loss: 0.0848 - acc: 0.9663 - val_loss: 0.5640 - val_acc: 0.8965
Epoch 7/10
3000/3000 [==============================] - 257s - loss: 0.0782 - acc: 0.9681 - val_loss: 0.5811 - val_acc: 0.8940
Epoch 8/10
3000/3000 [==============================] - 257s - loss: 0.0709 - acc: 0.9700 - val_loss: 0.6254 - val_acc: 0.8930
Epoch 9/10
3000/3000 [==============================] - 257s - loss: 0.0667 - acc: 0.9714 - val_loss: 0.6396 - val_acc: 0.8910
@elanmart how would you update your model using an Embedding layer and multiple LSTM layers?
# Build a classifier optimized for maximizing f1_score (uses class_weights)
clf = Sequential()
clf.add(Dropout(0.3))
clf.add(Dense(xt.shape[1], 1600, activation='relu'))
clf.add(Dropout(0.6))
clf.add(Dense(1600, 1200, activation='relu'))
clf.add(Dropout(0.6))
clf.add(Dense(1200, 800, activation='relu'))
clf.add(Dropout(0.6))
clf.add(Dense(800, yt.shape[1], activation='sigmoid'))
clf.compile(optimizer=Adam(), loss='binary_crossentropy')
clf.fit(xt, yt, batch_size=64, nb_epoch=300, validation_data=(xs, ys), class_weight=W, verbose=0)
preds = clf.predict(xs)
preds[preds>=0.5] = 1
preds[preds<0.5] = 0
print f1_score(ys, preds, average='macro')
Agree on binary_crossentropy
as a loss.
If multi-labels are sparse (i.e. many zeros and a few ones for each output) the network will reply with small values, and given a threshold of 0.5 as suggested above does not cut it.
One should manually find a threshold as suggested in the link by @suraj-deshmukh.
I got a little improvement by using tanh
as final layer activation function instead of the sigmoid
, as suggested by @jurastm.
Still, convergence is really slow and there should be a better solution.
I was thinking about giving more weight to the ones via class_weight
but can't understand how that works with multilabel output.
Help is appreciated :D
i'm trying to solve a similar requirement. Classifier using data with many labels (more than 200), with all of the labels being binary (flags). And for most rows the values are 0.
Please help pieroit and I. I believe the solution lies somewhere in adjusting weights for the 1s.
How to adjust weights with class_weights for input data?
@pieroit @bryan831 you could try to give more weight to positive targets in the loss function.
If you use the tensorflow backend of keras you can use tf.nn.weighted_cross_entropy_with_logits
like this: https://stackoverflow.com/a/47313183/979377
Would be interested to hear if this worked for you and how you set the POS_WEIGHT in relation to your number of classes!
Hi! I am facing a bit different problem in training multi-label classifier. I use sigmoid and binary cross entropy for training, however, the network's output got almost same values among images, like below. I have 200 classes, and now its output is not appropriate.
input_tensor = Input(shape=(img_rows, img_cols, n_channels))
vgg16 = VGG16(include_top=False, weights='imagenet', input_tensor=input_tensor)
top_model = Sequential()
top_model.add(Flatten(input_shape=vgg16.output_shape[1:]))
top_model.add(Dense(4096, activation='relu'))
top_model.add(Dropout(0.5))
top_model.add(Dense(4096, activation='relu'))
top_model.add(Dropout(0.5))
top_model.add(Dense(nb_classes, activation='sigmoid', init='glorot_uniform'))
model = Model(input=vgg16.input, output=top_model(vgg16.output))
model.compile(optimizer=optimizers.Adam(), loss='binary_crossentropy', metrics=['accuracy'])
image001: [[0.94, 0.03, 0.01, 0.91, ... , 0.91]]
image002: [[0.93, 0.02, 0.01, 0.93, ... , 0.93]]
image003: [[0.91, 0.02, 0.01, 0.92, ... , 0.92]]
Please tell me how to deal with this problem.
@pieroit @bryan831 I'm facing exactly the same issue as you do. I'm wondering did you use the method @tobigue suggested and how does that work? Could you show me how did you solve this problem? FYI I tried class_weight = {0:1, 1:20} but it did not work and error out, looks like it does not work for multi-dimensional output.
I need train a multi-label softmax classifier, but there is a lot of one-hot code labels in examples, so how to change code to do it?