keras-team / keras

Deep Learning for humans
http://keras.io/
Apache License 2.0
61.57k stars 19.41k forks source link

Loss not changing when training #2711

Closed kevkid closed 8 years ago

kevkid commented 8 years ago

I have a model that I am trying to train where the loss does not go down. I have a custom image set that I am using. These images are 106 x 106 px (black and white) and I have two (2) classes, Bargraph or Gels. These two classes are very different. I have run the Cifar10 dataset and it did reduce the loss, but I am very confused as to why my model will always predict only one class for everything.

Xtrain is a numpy array of images (which are numpy arrays), Ytrain is a numpy array of arrays ([0,1] or [1,0]) the shapes look like this:

np.shape(Xtrain)
Out[58]: (2000, 1, 106, 106)

np.shape(Ytrain)
Out[59]: (2000, 2)

Here is my model:

model = Sequential()
model.add(Convolution2D(32, 16, 16, border_mode='same',name='conv1', input_shape = (1, 106, 106)))
first_layer = model.layers[0]
model.add(Activation("relu"))
model.add(MaxPooling2D(pool_size=(2, 2)))

#2
model.add(Convolution2D(64, 15, 15, border_mode='same',name='conv2'))
model.add(Activation("relu"))
model.add(MaxPooling2D(pool_size=(2, 2)))

#3
model.add(Convolution2D(128, 14, 14, border_mode='same',name='conv3'))
model.add(Activation("relu"))

#flatten
model.add(Flatten())
model.add(Dense(2))
model.add(Activation('softmax'))

rms = RMSprop()
sgd = SGD(lr=0.000001, decay=1e-6, momentum=1.9)
model.compile(loss='categorical_crossentropy', optimizer=sgd,metrics=["accuracy"])
model.fit(Xtrain[950:1050], Ytrain[950:1050], batch_size=32, nb_epoch=20,
          verbose=1, validation_split=0.2,
          callbacks=[EarlyStopping(monitor='val_loss', patience=0)])

Right now I am just doing very small training sets (I tried doing 1000 examples as well, with similar results).

I have also tried RMS and SDG with large and small learning rates.

What else can I try ?

dandxy89 commented 8 years ago

Try increasing the learning rate to a higher value, possibly to 0.1. That way you can ensure that noticeable changes to weights are made for each successive update.

kevkid commented 8 years ago

Tried it, it stopped after 2 epochs. Here are the results

sgd = SGD(lr=0.1, decay=1e-6, momentum=1.9)
model.compile(loss='categorical_crossentropy', optimizer=sgd,metrics=["accuracy"])
model.fit(Xtrain[950:1050], Ytrain[950:1050], batch_size=32, nb_epoch=20,
          verbose=1, validation_split=0.2,
          callbacks=[EarlyStopping(monitor='val_loss', patience=0)])

Train on 80 samples, validate on 20 samples
Epoch 1/20
80/80 [==============================] - 84s - loss: 5.9848 - acc: 0.5250 - val_loss: 1.1921e-07 - val_acc: 1.0000
Epoch 2/20
80/80 [==============================] - 85s - loss: 10.0738 - acc: 0.3750 - val_loss: 1.1921e-07 - val_acc: 1.0000
Out[60]: <keras.callbacks.History at 0x7f6b30891c90>

model.predict(Xtrain[995:1005])
Out[62]: 
array([[ 1.,  0.],
       [ 1.,  0.],
       [ 1.,  0.],
       [ 1.,  0.],
       [ 1.,  0.],
       [ 1.,  0.],
       [ 1.,  0.],
       [ 1.,  0.],
       [ 1.,  0.],
       [ 1.,  0.]])

The first class is from 0 to 999 and the second class is from 1000 to 1999

I tried to predict right at the border and got all [1,0]. Shuffling the training set should not matter should it?

kevkid commented 8 years ago

After reading some blogs, looks like the batch size is important, because if our data is not shuffled it will learn one class for a few batches and then another class for a few batches. Similarly My loss seems to stay the same, here is an interesting read on the loss function. I really am still unsure as to what I may be doing wrong.

Here are a few things I tried:

I am really unsure as to what I can do to get my loss to go down. Any other ideas?

Code:

model = Sequential()
model.add(Convolution2D(32, 10, 10, border_mode='same',name='conv1', input_shape = (1, 106, 106)))
#flatten
model.add(Flatten())
model.add(Dense(2))
model.add(Activation('softmax'))
rms = RMSprop()
sgd = SGD(lr=0.1, decay=1e-2, momentum=0.9)
model.compile(loss='categorical_crossentropy', optimizer=sgd,metrics=["accuracy"])
model.fit(Xtrain, Ytrain, batch_size=64, nb_epoch=10,
          verbose=1)

Here is the code for me to read in the images if it helps:

def loadCustomData():
    #read directories
    #files = []
    fileRoots = []
    for root, directories, filenames in os.walk('modified/convNetPanelsPaneled_sorted/'):
        for filename in filenames: 
            #files.append(os.path.join(root,filename))
            fileRoots.append(root)
    fileRoots = list(sorted(set(fileRoots)))
    bar = imgs.getFiles(fileRoots[19])
    bar = [fileRoots[19] + "/" + s for s in list(bar)]
    gel = imgs.getFiles(fileRoots[0])
    gel = [fileRoots[0] + "/" + s for s in list(gel)]
    Xtrain = []
    Ytrain = []
    Xtest = []
    Ytest = []
    #for i in range(int(len(bar)*.95)):
    for i in range(1000):
        imgArray = imgs.load_image(bar[i])
        if(len(np.shape(imgArray)) == 3):
            imgArray = imgArray[:,:,1]#set to 1 so we keep the greyscale
        Xtrain.append(imgArray)
        Ytrain.append([1])

    #for i in range(len(bar) - int(len(bar)*.95)):
    for i in range(1000,1100):
        imgArray = imgs.load_image(bar[i])
        if(len(np.shape(imgArray)) == 3):
            imgArray = imgArray[:,:,1]
        Xtest.append(imgArray)
        Ytest.append([1])

    for i in range(1000):
        imgArray = imgs.load_image(gel[i])
        if(len(np.shape(imgArray)) == 3):
            imgArray = imgArray[:,:,1]
        Xtrain.append(imgArray)
        Ytrain.append([0])

    #for i in range(len(gel) - int(len(gel)*.95)):
    for i in range(1000,1100):
        imgArray = imgs.load_image(gel[i])
        if(len(np.shape(imgArray)) == 3):
            imgArray = imgArray[:,:,1]
        Xtest.append(imgArray)
        Ytest.append([0])   

    Ytrain = np_utils.to_categorical(Ytrain, 2)
    Ytrain = np.array(Ytrain)
    Ytest = np_utils.to_categorical(Ytest, 2)
    Ytest = np.array(Ytest)
    c = np.c_[np.array(Xtrain).reshape(len(Xtrain), -1), np.array(Ytrain).reshape(len(Ytrain), -1)]
    np.random.shuffle(c)
    Xtrain = c[:, :np.array(Xtrain).size//len(Xtrain)].reshape(np.array(Xtrain).shape)
    Ytrain = c[:, np.array(Xtrain).size//len(Xtrain):].reshape(np.array(Ytrain).shape)
    #reshape to fit the network
    Xtrain = np.array(Xtrain).reshape(2000, 1, 106, 106)#np.array(Xtrain).reshape(4415, 1, 106, 106)
    Xtrain = Xtrain.astype('float32')
    Xtest = np.array(Xtest).reshape(200, 1, 106, 106)#np.array(Xtest).reshape(233, 1, 106, 106)
    Xtest = Xtest.astype('float32')

    Xtrain /= 255
    Xtest /= 255

Here is my output:

Epoch 1/10
2000/2000 [==============================] - 73s - loss: 8.0590 - acc: 0.5000     
Epoch 2/10
2000/2000 [==============================] - 75s - loss: 8.0590 - acc: 0.5000     
...SNIP...    
Epoch 7/10
2000/2000 [==============================] - 81s - loss: 8.0590 - acc: 0.5000     

Links: What is batch size in neural network?

http://cs231n.github.io/neural-networks-3/#loss

joelthchao commented 8 years ago
  1. Miss activation (e.g. relu) after Convolution2D. I use your network on cifar10 data, loss does not decrease but increase. With activation, it can learn something basic.
  2. Network is too shallow. It's hard to learn with only a convolutional layer and a fully connected layer. Try Alexnet or VGG style to build your network or read examples (cifar10, mnist) in Keras.
  3. I recommend you to take some online courses about deep learning, it would be helpful.
kevkid commented 8 years ago

Hi @joelthchao,

I am unsure as to what you mean in 1 for miss activation. Are you saying if you remove the activation the loss increases and when you use activation it learns?

For 2: I actually tried with a deeper network, but I figured since it was giving me no improvement, it may be best to simplify the model and troubleshoot with that.

I am wondering if this could be an issue with my data.

I can try to increasing the depth. Anything else I should look at?

I will post my results from the cifar10.

EDIT: I found this example:

https://github.com/fchollet/keras/blob/master/examples/cifar10_cnn.py

I will try it and adapt it to my needs. Can anyone explain why we do double convolution like this?

joelthchao commented 8 years ago

@kevkid Try this, does loss still not decrease?

model.add(Convolution2D(32, 10, 10, border_mode='same',name='conv1', input_shape = (1, 106, 106)))
model.add(Activation('relu'))
# ...
kevkid commented 8 years ago

@joelthchao

Just tried it:

Train on 1600 samples, validate on 400 samples
Epoch 1/20
1600/1600 [==============================] - 60s - loss: 8.8503 - acc: 0.3969 - val_loss: 1.1921e-07 - val_acc: 1.0000
Epoch 2/20
1600/1600 [==============================] - 60s - loss: nan - acc: 0.3750 - val_loss: nan - val_acc: 1.0000
Out[17]: <keras.callbacks.History at 0x7f0cc152f7d0>

loss goes to nan. Could the weights be blowing up? I will try the example from keras for cifar 10.

kevkid commented 8 years ago

Could this be my architecture? Are there any resources for designing the neural network? Here is how the model currently looks:

model = Sequential()

model.add(Convolution2D(32, 5, 5, border_mode='same',name='conv1_1', input_shape = (1, 106, 106)))
#first_layer = model.layers[0]
# this is a placeholder tensor that will contain our generated images
#input_img = first_layer.input
#dream = input_img
model.add(Activation("relu"))
model.add(Convolution2D(32, 5, 5, border_mode='same',name='conv1_2'))
model.add(Activation("relu"))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))

#2
model.add(Convolution2D(64, 5, 5, border_mode='same',name='conv2_1'))
model.add(Activation("relu"))
model.add(Convolution2D(64, 5, 5, border_mode='same',name='conv2_2'))
model.add(Activation("relu"))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))

#flatten
model.add(Flatten())
model.add(Dense(512))
model.add(Activation("relu"))
model.add(Dropout(0.5))

model.add(Dense(2))
model.add(Activation('softmax'))

rms = RMSprop()
sgd = SGD(lr=0.01, decay=1e-6, momentum=1, nesterov=True)
model.compile(loss='categorical_crossentropy', optimizer=sgd,metrics=["accuracy"])
model.fit(Xtrain, Ytrain, batch_size=32, nb_epoch=20,
          verbose=1, validation_split=0.2,
          callbacks=[EarlyStopping(monitor='val_loss', patience=2)])

Based on cifar10 example. I am currently running the model.

kevkid commented 8 years ago

I was able to make a decent model that gave me excellent results. I am unsure why rmsprop seems to make the loss go up, but here is my model:

model = Sequential()

model.add(Convolution2D(32, 5, 5, border_mode='same',name='conv1_1', input_shape = (1, 106, 106)))
#first_layer = model.layers[0]
# this is a placeholder tensor that will contain our generated images
#input_img = first_layer.input
#dream = input_img
model.add(Activation("relu"))
model.add(Convolution2D(32, 5, 5, border_mode='same',name='conv1_2'))
model.add(Activation("relu"))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))

#2
model.add(Convolution2D(64, 5, 5, border_mode='same',name='conv2_1'))
model.add(Activation("relu"))
model.add(Convolution2D(64, 5, 5, border_mode='same',name='conv2_2'))
model.add(Activation("relu"))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))

#flatten
model.add(Flatten())
#model.add(Dense(512))
#model.add(Activation("relu"))
model.add(Dropout(0.5))

model.add(Dense(2))
model.add(Activation('softmax'))

rms = RMSprop()
sgd = SGD(lr=0.001, decay=1e-6, momentum=0.5, nesterov=True)
model.compile(loss='categorical_crossentropy', optimizer=sgd,metrics=["accuracy"])
model.fit(Xtrain, Ytrain, batch_size=32, nb_epoch=100,
          verbose=1)

--SNIP--
Epoch 99/100
2000/2000 [==============================] - 11s - loss: 0.0570 - acc: 0.9795     
Epoch 100/100
2000/2000 [==============================] - 11s - loss: 0.0576 - acc: 0.9815     
Out[14]: <keras.callbacks.History at 0x7f204ba9f0d0>
--SNIP--
print('Classifcation rate %02.3f' % model.evaluate(Xtest, Ytest)[1])
200/200 [==============================] - 0s     
Classifcation rate 0.930
111hypo commented 8 years ago

I have got the same problem as you , but I guess there must be something worng with my function of load_data, and I am not sure if I get the right Xtrain and Ytrain, so if you please share your code of load_data , thanks a lot.

kevkid commented 8 years ago

@111hypo I have posted my loadCustomData function a few posts above this one. The portion :

    c = np.c_[np.array(Xtrain).reshape(len(Xtrain), -1), np.array(Ytrain).reshape(len(Ytrain), -1)]
    np.random.shuffle(c)
    Xtrain = c[:, :np.array(Xtrain).size//len(Xtrain)].reshape(np.array(Xtrain).shape)
    Ytrain = c[:, np.array(Xtrain).size//len(Xtrain):].reshape(np.array(Ytrain).shape)

is unnecessary because we do not need to shuffle the input (This was just a test to try and figure out why My network would not converge).

I still have problems with RMSprop. It quickly gains loss, and the accuracy goes to 0 (which to me is funky). I tried a few different SGDs and the one in my latest post seemed to work the best for me.

alyato commented 8 years ago

@kevkid I also meet your problem. I collect 1505 numbers pics as my dataset and use a simple model. The valid_acc doesn't change. Do you give me some advices. Thanks.

alyato commented 8 years ago

@111hypo ,how do you solve your problem.

albertyou2 commented 7 years ago

@kevkid Have you found the solution now?I have met the same problem to yours.

Have you tried to change the ' momentum=1.9 '.I found that this problem may connected to the argument named 'momentum' in SGD optimizter.

I did't find the solution yet but when I changed the momentum to 0.5, the loss changed.But after several epoch , the loss did not change again......

hope this can help you !

DanlanChen commented 7 years ago

You may want to reduce your drop out rate and shuffle your data

fatemaaa commented 7 years ago

Hello, I used vgg19 architecture to classify my data set into 2 classes but the problem is the value of accuracy doesn't change after 30 iteration and I don't what is the problem and this my code: data,Label = shuffle(immatrix,label, random_state=2) train_data = [data,Label] print (train_data[0].shape) # the train data print (train_data[1].shape) # the labels of these data

batch_size to train

batch_size = 16

number of output classes

nb_classes = 3

number of epochs to train

nb_epoch = 150 #each epochs contains around 70000/128=468 batches with 128 images img_rows, img_cols = 200, 200 (X, y) = (train_data[0],train_data[1])

STEP 1: split X and y into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=4) X_train = X_train.reshape(X_train.shape[0], 1, img_rows, img_cols) X_test = X_test.reshape(X_test.shape[0], 1, img_rows, img_cols) X_train = X_train.astype('float32') X_test = X_test.astype('float32')

Normalization

X_train /= 255 X_test /= 255

compute the total time of prearing and importing the dataset

t_generateArray = time.time() print('Generating Array Time:{}'.format(t_generateArray - t_start))

print('X_train shape:', X_train.shape) print(X_train.shape[0], 'train samples') print(X_test.shape[0], 'test samples')

convert class vectors to binary class matrices

Y_train = np_utils.to_categorical(y_train, nb_classes) Y_test = np_utils.to_categorical(y_test, nb_classes)

Step 1:Network structure

model = Sequential() model.add(ZeroPadding2D((1, 1), input_shape=(1, img_rows, img_cols))) model.add(Convolution2D(64, 3, 3, activation='relu',init='glorot_uniform')) model.add(ZeroPadding2D((1, 1))) model.add(Convolution2D(64, 3, 3, activation='relu',init='glorot_uniform')) model.add(MaxPooling2D((2, 2), strides=(2, 2)))

model.add(ZeroPadding2D((1, 1))) model.add(Convolution2D(128, 3, 3, activation='relu',init='glorot_uniform')) model.add(ZeroPadding2D((1, 1))) model.add(Convolution2D(128, 3, 3, activation='relu',init='glorot_uniform')) model.add(MaxPooling2D((2, 2), strides=(2, 2)))

model.add(ZeroPadding2D((1, 1))) model.add(Convolution2D(256, 3, 3, activation='relu',init='glorot_uniform')) model.add(ZeroPadding2D((1, 1))) model.add(Convolution2D(256, 3, 3, activation='relu',init='glorot_uniform')) model.add(ZeroPadding2D((1, 1))) model.add(Convolution2D(256, 3, 3, activation='relu',init='glorot_uniform')) model.add(ZeroPadding2D((1, 1))) model.add(Convolution2D(256, 3, 3, activation='relu',init='glorot_uniform')) model.add(MaxPooling2D((2, 2), strides=(2, 2)))

model.add(ZeroPadding2D((1, 1))) model.add(Convolution2D(512, 3, 3, activation='relu',init='glorot_uniform')) model.add(ZeroPadding2D((1, 1))) model.add(Convolution2D(512, 3, 3, activation='relu',init='glorot_uniform')) model.add(ZeroPadding2D((1, 1))) model.add(Convolution2D(512, 3, 3, activation='relu',init='glorot_uniform')) model.add(ZeroPadding2D((1, 1))) model.add(Convolution2D(512, 3, 3, activation='relu',init='glorot_uniform')) model.add(MaxPooling2D((2, 2), strides=(2, 2)))

model.add(ZeroPadding2D((1, 1))) model.add(Convolution2D(512, 3, 3, activation='relu',init='glorot_uniform')) model.add(ZeroPadding2D((1, 1))) model.add(Convolution2D(512, 3, 3, activation='relu',init='glorot_uniform')) model.add(ZeroPadding2D((1, 1))) model.add(Convolution2D(512, 3, 3, activation='relu',init='glorot_uniform')) model.add(ZeroPadding2D((1, 1))) model.add(Convolution2D(512, 3, 3, activation='relu',init='glorot_uniform')) model.add(MaxPooling2D((2, 2), strides=(2, 2)))

model.add(Flatten()) model.add(Dense(1024, activation='relu',init='glorot_uniform')) model.add(Dropout(0.5)) model.add(Dense(1024, activation='relu',init='glorot_uniform')) model.add(Dropout(0.5)) model.add(Dense(nb_classes, activation='softmax')) # neuron number

step 2: Learning target(computiong the loss using the entropy function

sgd = SGD(lr=0.0001, decay=1e-6, momentum=0.9, nesterov=True)

adagrad = Adagrad(lr=0.01, epsilon=1e-08)

model.compile(loss='categorical_crossentropy',optimizer= sgd,metrics=['accuracy'])

checkpointer=ModelCheckpoint(filepath='exp_161123_best_lr0.0001_weights.h5',monitor='acc', verbose=1, save_best_only=True, mode='max')

training the model

print ('Training Start....') hist = model.fit(X_train, Y_train, batch_size=batch_size, nb_epoch=nb_epoch, verbose=1, validation_data=(X_test, Y_test))

,callbacks=[checkpointer]

model.save_weights('final_last5_weights.h5')

model.save_weights('exp_161123_final_lr0.0001_weights.h5')

evaluate the model

score = model.evaluate(X_test, Y_test,verbose=0) print('Test score:', score[0]) print('Test accuracy:', score[1])

redouanelg commented 7 years ago

If you have unbalanced classes, maybe you should consider weighting classes, check class_weight and sample_weight in Keras docs

Here's a similar question asked on stackoverflow http://stackoverflow.com/questions/41881220/keras-predict-always-output-same-value-in-multi-classification

alyato commented 7 years ago

@redouanelg Do you mean by adding sample_weight in fit()? Or do you give an example how to do ? thanks.

redouanelg commented 7 years ago

@alyato Sorry for the late reply.

My modest experience tells me that if you have only two classes use a dict in class_weight.

If you have more you'll get the error class_weight not supported for +3 dim.

A way to overcome this consists in adding sample_weight in fit() using a 2D weight array (one weight per timestep per sample), and adding sample_weight_mode="temporal" in compile()

It's not an elegant solution but it works. I'll be glad if someone has another answer.

dityas commented 7 years ago

Hey, i am having a similar problem i am trying to train a network to learn word embeddings using skip grams. i have a vocabulary of 256 and a sequence of about 166000 words. But when i train, the accuracy stays the same at around 0.1327 no matter what i do, i tried changing learning rates and batch_size. But no luck. This has happened every time i used keras. But it usually starts learning after tweaking the batch size a bit. But this one just doesn't work.

Here is the model: `

  def make_model(self,vocab_size=256,vec_dim=100):
    model=Sequential()
    model.add(Dense(vec_dim,activation="sigmoid",input_dim=vocab_size))
    model.add(Dense(vocab_size,activation="sigmoid"))
    sgd=SGD(lr=1.0)
    model.compile(loss="categorical_crossentropy",optimizer=sgd,metrics=["accuracy"])
    return model

def _get_callbacks(self):
    earlystop=EarlyStopping(monitor="val_loss",min_delta=0.0001,patience=10,verbose=2)
    checkpoint=ModelCheckpoint("checkpt.hdf5",period=10,verbose=2)
    reducelr=ReduceLROnPlateau(monitor="val_loss",factor=0.1,patience=5,verbose=2)
    return [earlystop,checkpoint,reducelr]

def train(self,model,X,y):
    model.fit(X,y,nb_epoch=1000,callbacks=self._get_callbacks(),validation_split=0.1,verbose=2,batch_size=300)
    model.save("model.hdf5")

` X is a one hot vector of len 256 for every word y in a one hot vector of len 256 representing the skip word in the context of X. so for instance is the sequence is [2,6,5,7,9] X will be [5,5,5,5,7,7,7...]

y will be [2,6,7,9,6,5,9...] and so on for every word in the sequence.

This is what happens when i try to train:

9s - loss: 4.5012 - acc: 0.1794 - val_loss: 4.5873 - val_acc: 0.1327 Epoch 2/1000 9s - loss: 4.2679 - acc: 0.1801 - val_loss: 4.8339 - val_acc: 0.1327 Epoch 3/1000 9s - loss: 4.2363 - acc: 0.1801 - val_loss: 4.7040 - val_acc: 0.1327 Epoch 4/1000 9s - loss: 4.2102 - acc: 0.1801 - val_loss: 4.6947 - val_acc: 0.1327 Epoch 5/1000 9s - loss: 4.1882 - acc: 0.1801 - val_loss: 4.6625 - val_acc: 0.1327 Epoch 6/1000 9s - loss: 4.1777 - acc: 0.1801 - val_loss: 4.6303 - val_acc: 0.1327

I ve waited for a about 50 epochs and the acc still does not change.

Any idea what i am doing wrong? Ive faced this problem everytime ive used keras even when training other models like language modelling using RNNs text generation using LSTMs.

arielbenitah commented 7 years ago

Hi @adityashinde1506, You may have already found a solution but if not, try to decrease your learning rate.

vijaydwivedi75 commented 7 years ago

Hi guys, I am having a similar problem. I am training an LSTM model for text classification and my loss does not improve on subsequent epochs. I tried many optimizers with different learning rates. But same problem.

max_length = 275
X_train = sequence.pad_sequences(train_data_new, maxlen=max_length, padding='post')
X_test = sequence.pad_sequences(test_data_new, maxlen=max_length, padding='post')

y_train = []
y_test = []

# preparing y_test and y_train
for label in train_label:
    if label == 'first':
        y_train.append([1,0])
    else:
        y_train.append([0,1])

y_train = np.array(y_train)

for label in test_label:
    if label == 'second':
        y_test.append([1,0])
    else:
        y_test.append([0,1])

y_test = np.array(y_test)

# Create the model
rmsprop = RMSprop(lr=0.1)
sgd = SGD(lr=0.1)
model = Sequential()
model.add(Embedding(len(word_index) + 1, EMBEDDING_DIM, 
    weights=[embedding_matrix], 
    input_length=max_length,
    trainable=True))
# model.add(Dropout(0.2))
model.add(LSTM(128, return_sequences=True))
# model.add(Dropout(0.2))
model.add(LSTM(64))
model.add(Dense(2, activation='sigmoid'))
model.compile(loss='categorical_crossentropy', optimizer=rmsprop, metrics=['accuracy'])
print(model.summary())

print('Training model...')
model.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=10, batch_size=64)

# Final evaluation of the model
scores = model.evaluate(X_test, y_test, verbose=0)
print("Accuracy: %.2f%%" % (scores[1]*100))

The OUTPUT I get is this:

Total train samples:  931
Total test samples:  390
Processing text dataset
Found 6132 unique tokens.
Indexing word vectors.
Found 400000 word vectors.
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
embedding_1 (Embedding)      (None, 275, 200)          1226600   
_________________________________________________________________
lstm_1 (LSTM)                (None, 275, 128)          168448    
_________________________________________________________________
lstm_2 (LSTM)                (None, 64)                49408     
_________________________________________________________________
dense_1 (Dense)              (None, 2)                 130       
=================================================================
Total params: 1,444,586
Trainable params: 1,444,586
Non-trainable params: 0
_________________________________________________________________
None
Training model...
Train on 931 samples, validate on 390 samples
Epoch 1/10
931/931 [==============================] - 16s - loss: 0.8362 - acc: 0.5081 - val_loss: 0.6939 - val_acc: 0.4949
Epoch 2/10
931/931 [==============================] - 14s - loss: 0.6935 - acc: 0.5016 - val_loss: 0.6936 - val_acc: 0.4949
Epoch 3/10
931/931 [==============================] - 14s - loss: 0.6933 - acc: 0.5016 - val_loss: 0.6933 - val_acc: 0.4949
Epoch 4/10
931/931 [==============================] - 14s - loss: 0.6933 - acc: 0.5016 - val_loss: 0.6933 - val_acc: 0.4949
Epoch 5/10
931/931 [==============================] - 14s - loss: 0.6933 - acc: 0.5016 - val_loss: 0.6932 - val_acc: 0.4949
Epoch 6/10
931/931 [==============================] - 14s - loss: 0.6932 - acc: 0.5016 - val_loss: 0.6932 - val_acc: 0.4949
Epoch 7/10
931/931 [==============================] - 14s - loss: 0.6932 - acc: 0.5016 - val_loss: 0.6932 - val_acc: 0.4949
Epoch 8/10
931/931 [==============================] - 13s - loss: 0.6932 - acc: 0.5016 - val_loss: 0.6931 - val_acc: 0.4949
Epoch 9/10
931/931 [==============================] - 13s - loss: 0.6931 - acc: 0.5016 - val_loss: 0.6931 - val_acc: 0.4949
Epoch 10/10
931/931 [==============================] - 14s - loss: 0.6931 - acc: 0.5016 - val_loss: 0.6931 - val_acc: 0.4949
Accuracy: 49.49%

I am unable to figure out what the problem is.

td2014 commented 7 years ago

Out of curiosity, why are you passing in a "weights" matrix to the Embedding layer? Thanks.

vijaydwivedi75 commented 7 years ago

Hi @td2014 , that weights in Embedding layer is just because i want to give my own embeddings (GloVe in this case) for the word inputs. Even when i removed the weights and ran the file, my loss didnt change.

td2014 commented 7 years ago

Okay. And you set the learning rate to 0.1 for your optimizer(s). Just curious, but was the default not working? Thanks.

vijaydwivedi75 commented 7 years ago

Initially, it was default. Then I read on a similar issue page on stackoverflow where it told to alter learning rates. So I was trying to see the change on different values. Using lr=0.1 the loss starts from 0.83 and becomes constant at 0.69. When I was using default value, loss was stuck same at 0.69

td2014 commented 7 years ago

Okay. I created a simplified version of what you have implemented, and it does seem to work (loss decreases). Here is the code you can cut and paste. Note that the first section is setting up the environment for reproducible results (which I provide at the end in my case). In your case, you may want to check a few things: 1) Is your input data making sense? It could be that the preprocessing steps (the padding) are creating input sequences that cannot be separated (perhaps you are getting a lot of zeros or something of that sort). 2) You might want to simplify your architecture to include just a single LSTM layer (like I did) just until you convince yourself that the model is actually learning something.

I hope this helps. Thanks.


Start: Set up environment for reproduction of results

import numpy as np import tensorflow as tf import random as rn import os os.environ['PYTHONHASHSEED'] = '0' np.random.seed(42) rn.seed(12345)

single thread

session_conf = tf.ConfigProto( intra_op_parallelism_threads=1, inter_op_parallelism_threads=1)

from keras import backend as K tf.set_random_seed(1234) sess = tf.Session(graph=tf.get_default_graph(), config=session_conf) K.set_session(sess)

End: Set up environment for reproduction of results

# from keras.layers import LSTM, Dense, Embedding from keras.models import Sequential from keras.preprocessing import sequence

#

Create input sequences

# word_index=21 train_data_new = [] train_data_new.append([1, 1, 1, 1, 1, 1, 1 ]) train_data_new.append([2, 2, 2, 2, 2, 2, 2 ]) train_data_new.append([2, 2, 2, 2, 2, 2, 2 ]) train_data_new.append([1, 1, 1, 1, 1, 1, 1 ]) train_data_new.append([2, 2, 2, 2, 2, 2, 2 ]) train_data_new.append([2, 2, 2, 2, 2, 2, 2 ]) train_data_new.append([1, 1, 1, 1, 1, 1, 1 ]) train_data_new.append([2, 2, 2, 2, 2, 2, 2 ]) train_data_new.append([2, 2, 2, 2, 2, 2, 2 ])

#

Preprocess

#

max_length = 5 X_train = sequence.pad_sequences(train_data_new, maxlen=max_length, padding='post')

preparing y_train

y_train = [] y_train.append([1,0]) y_train.append([0,1]) y_train.append([0,1]) y_train.append([1,0]) y_train.append([0,1]) y_train.append([0,1]) y_train.append([1,0]) y_train.append([0,1]) y_train.append([0,1])

y_train = np.array(y_train)

#

Create model

#

EMBEDDING_DIM=16

model = Sequential() model.add(Embedding(word_index + 1, EMBEDDING_DIM, input_length=max_length)) model.add(LSTM(5)) model.add(Dense(2, activation='sigmoid')) model.compile(loss='categorical_crossentropy', optimizer='rmsprop', metrics=['accuracy']) print(model.summary())

#

Train

print('Training model...') model.fit(X_train, y_train, epochs=10, shuffle=False)

#

output predictions

#

predictions = model.predict(X_train)

====OUTPUT BELOW====

Layer (type) Output Shape Param #

embedding_29 (Embedding) (None, 5, 16) 352


lstm_29 (LSTM) (None, 5) 440


dense_29 (Dense) (None, 2) 12

Total params: 804.0 Trainable params: 804 Non-trainable params: 0.0


None Training model... Epoch 1/10 9/9 [==============================] - 2s - loss: 0.7013 - acc: 0.3333 Epoch 2/10 9/9 [==============================] - 0s - loss: 0.6953 - acc: 0.3333 Epoch 3/10 9/9 [==============================] - 0s - loss: 0.6911 - acc: 0.3333 Epoch 4/10 9/9 [==============================] - 0s - loss: 0.6875 - acc: 1.0000 Epoch 5/10 9/9 [==============================] - 0s - loss: 0.6842 - acc: 1.0000 Epoch 6/10 9/9 [==============================] - 0s - loss: 0.6812 - acc: 1.0000 Epoch 7/10 9/9 [==============================] - 0s - loss: 0.6783 - acc: 1.0000 Epoch 8/10 9/9 [==============================] - 0s - loss: 0.6754 - acc: 1.0000 Epoch 9/10 9/9 [==============================] - 0s - loss: 0.6726 - acc: 1.0000 Epoch 10/10 9/9 [==============================] - 0s - loss: 0.6698 - acc: 1.0000

unnir commented 7 years ago

@td2014

Dude, you architecture just does not work. Try something different.

vijaydwivedi75 commented 7 years ago

It works, I had to clean the data. Then the loss started to converge

jianning-li commented 7 years ago

try 'sigmoid' activation for the last layer since it's a binary classification problem

whistler commented 7 years ago

Here is a good list of issues to check for that I have found useful: https://blog.slavv.com/37-reasons-why-your-neural-network-is-not-working-4020854bd607

tebzito commented 6 years ago

Another reason could be class imbalance. So try upsampling or downsampling using SMOTE/OneSidedSelection from imblearn package, then reshape your data back to 4 dimensions for your model.

taninaim commented 6 years ago

I encountered the problem while I was trying to finetune a pretrained VGGFace model, using keras_vggface.utils.preprocess_input as my custom preprocessing function.

def preprocess_input(x, data_format=None, version=1):
    if data_format is None:
        data_format = K.image_data_format()
    assert data_format in {'channels_last', 'channels_first'}

    if version == 1:
        if data_format == 'channels_first':
            x = x[:, ::-1, ...]
            x[:, 0, :, :] -= 93.5940
            x[:, 1, :, :] -= 104.7624
            x[:, 2, :, :] -= 129.1863
        else:
            x = x[..., ::-1]
            x[..., 0] -= 93.5940
            x[..., 1] -= 104.7624
            x[..., 2] -= 129.1863

    elif version == 2:
        if data_format == 'channels_first':
            x = x[:, ::-1, ...]
            x[:, 0, :, :] -= 91.4953
            x[:, 1, :, :] -= 103.8827
            x[:, 2, :, :] -= 131.0912
        else:
            x = x[..., ::-1]
            x[..., 0] -= 91.4953
            x[..., 1] -= 103.8827
            x[..., 2] -= 131.0912
    else:
        raise NotImplementedError

    return x

The problem seems to come from the scaling. I used preprocessing_function=keras_vggface.utils.preprocess_input and got into that problem. However, when I rescale it with 1/255. the problem is fixed. I think it may be that the pretrained model was trained additionally with a scaling factor to normalize it to [0,1], but the preprocessing function only gives us the mean so we know which means to subtract to center the data. I'd recommend you check if your scaling makes sense; a bad scaling of inputs into a Neural Network may cause your updates to either move very slowly (i.e. the derivative of the sigmoid function beyond -3 and +3 are near 0 and so your gradients are almost 0), or if you're using something like the ReLU function, the updates may be big (the derivative is 1) and a wrong update makes you jump pass the local minima very easily.

ALSO, if you're rescaling in python 2, make sure you have that dot in 1/255., or else all your inputs will be multiplied by 0 and you aren't making any updates!!!

lixiang-ucas commented 6 years ago

sigmoid_cross_entropy_with_logits may encounters the gradients explosion problem, try using clip_gradients.

yosunpeng commented 6 years ago

In my case, It is the normalization problem: (x_train, y_train), (x_test, y_test) = cifar10.load_data() x_train = x_train.astype('float32') x_test = x_test.astype('float32')

x_train /= 255 x_test /= 255 # normalization [0, 255]--->[0, 1]

sujanme25 commented 5 years ago

I am trying to Create DNN but it is not converging, any idea model = Sequential() model.add(Dense(5000, input_dim=5, activation='relu', kernel_regularizer=regularizers.l2(0.1))) model.add(Dropout(0.1)) model.add(Dense(2000, kernel_regularizer=regularizers.l2(0.1),activation='relu')) model.add(Dropout(0.1)) model.add(Dense(400, kernel_regularizer=regularizers.l2(0.1),activation='relu')) model.add(Dropout(0.1)) model.add(Dense(1,activation='relu'))

rmsprop = optimizers.RMSprop(lr=0.01, rho=0.7, epsilon=1e-8, decay=0.0) sgd = optimizers.SGD(lr=0.001, decay=1e-6, momentum=0.4, nesterov=True) model.compile(loss='mse', optimizer='adam', metrics=["mae"]) callbacks = [EarlyStopping(monitor='val_loss', patience=5), ModelCheckpoint(filepath='DNN_Adam.h5', monitor='val_loss', save_best_only=True)] np.random.seed(3) history=model.fit(X_train, y_train, epochs=500 , batch_size=5000, validation_split=0.1, verbose=2,callbacks=callbacks) Train on 16109 samples, validate on 1790 samples Epoch 1/500

SujanaC commented 5 years ago

I had a similar issue today when training on Google cloud GPU. I tried changing network architecture, weights, etc. The solution was to reset the TF graph: tf_reset_default_graph() Somehow the GPU seemed to have a "memory" across different runs and was stuck at a local minima.

psy-mas commented 5 years ago

I had a similar issue today when training on Google cloud GPU. I tried changing network architecture, weights, etc. The solution was to reset the TF graph: tf_reset_default_graph() Somehow the GPU seemed to have a "memory" across different runs and was stuck at a local minima.

Thx,I will have a try,hope it would work!

Fellfalla commented 5 years ago

For me, theese 3 things did the trick:

  1. Lower the learning rate (0.1 converges too fast and already after the first epoch, there is no change anymore). Just for test purposes try a very low value like lr=0.00001.
  2. Check the input for proper value range and normalize it
  3. Add BatchNormalization (model.add(BatchNormalization())) after each layer
Alfatlawi commented 5 years ago

For me, this works: Add BatchNormalization (model.add(BatchNormalization())) after each layer Thanks @Fellfalla

DiyuanLu commented 5 years ago

I had the same problem. But for me it is just the learning_rate is too small. Try with a bigger learning_rate. It might help

qianzzzz commented 4 years ago

Based on my own experience as a starter, one possible reason or bug in your model is that you probably used a wrong activation function, i.e. the way you activated your result at the last output layer, for example, if you are trying to solve a multi class proplem, usually we use softmat rather sigmoid, while sigmoid is meant to activate the output for binary task. And in this case, it's a binary application, therefore just change your activation function as sigmoid, you should not find such exception.

Dulum commented 4 years ago

Try removing the Activation('softmax') layer.

QoT commented 4 years ago

I had a model that did not train at all. It just stucks at random chance of particular result with no loss improvement during training. Loss was constant 4.000 and accuracy 0.142 on 7 target values dataset.

It become true that I was doing regression with ReLU last activation layer, which is obviously wrong.

Before I was knowing that this is wrong, I did add Batch Normalisation layer after every learnable layer, and that helps. However, training become somehow erratic so accuracy during training could easily drop from 40% down to 9% on validation set. Accuracy on training dataset was always okay.

Then I realized that it is enough to put Batch Normalisation before that last ReLU activation layer only, to keep improving loss/accuracy during training. That probably did fix wrong activation method.

However, when I did replace ReLU with Linear activation (for regression), no Batch Normalisation was needed any more and model started to train significantly better.

1337-Pete commented 4 years ago

I'd highly recommend messing around with learning rates... Testing with extreme variability, like lr = 0.1 & lr = 1e-4, should do the trick in most instances.

amapic commented 4 years ago

I think that the problem comes from the learning rate, Mine was actually equal to 7 ahah. I wrote 10-3 instead of 1e-3.

stock-ds commented 3 years ago

My data had 3 classes but last layer was Dense(1, activation='sigmoid') changing it to Dense(3, activation='sigmoid') made the loss change. With the 1 output neuron it didn't return any errors, just had a constant loss.

Also if you are training binary classifier, you can just use Dense(1, activation='sigmoid') as output with binary_crossentropy, instead of Dense(2, activation='sigmoid') with categorical_crossentropy

wuditianyou commented 3 years ago

Happened many times, my solution is change or remove the activation function in the last layer.

soans1994 commented 3 years ago

Try removing the Activation('softmax') layer.

finally