talentlei commented 8 years ago

I use LSTM to do a sequence labeling task, but I got the same acc and cal_acc for each epoch. here is my code:

def moduleRnn(self): model = Sequential() model.add(LSTM(output_dim=64,input_length=self.seq_len,batch_input_shape=(16,1,200),input_dim=self.embed_length,return_sequences=True,stateful=False ))

model.add(LSTM(output_dim=16,return_sequences=True,stateful=False ))

    model.add(Dropout(0.2))
    model.add(TimeDistributedDense(output_dim=self.labs_len))
    model.add(Activation('softmax'))
    model.compile(loss="categorical_crossentropy" , optimizer='rmsprop' , class_mode='categorical')
    #model.fit(self.train,self.train_lab,batch_size=16,nb_epoch=3,verbose=1, validation_split=0.1,show_accuracy=True)
    model.fit(self.X_train,self.Y_train,batch_size=16,nb_epoch=15,verbose=1,show_accuracy=True,validation_split=0.2)
    score = model.evaluate(self.X_test,self.Y_test,batch_size=16)
    print score

Anyone meets the same problem? please help me

ymcui commented 8 years ago

Do you mean training accuracy and validation accuracy doesn't change in training procedure? You'd better post your logs

talentlei commented 8 years ago

@ymcui yes,it is.

Epoch 1/15 18272/18272 [==============================] - 118s - loss: 0.0479 - acc: 0.4296 - val_loss: 0.0285 - val_acc: 0.4286 Epoch 2/15 18272/18272 [==============================] - 114s - loss: 0.0322 - acc: 0.4297 - val_loss: 0.0282 - val_acc: 0.4286 Epoch 3/15 18272/18272 [==============================] - 113s - loss: 0.0319 - acc: 0.4297 - val_loss: 0.0281 - val_acc: 0.4286 Epoch 4/15 18272/18272 [==============================] - 114s - loss: 0.0317 - acc: 0.4297 - val_loss: 0.0283 - val_acc: 0.4286 Epoch 5/15 18272/18272 [==============================] - 120s - loss: 0.0316 - acc: 0.4297 - val_loss: 0.0281 - val_acc: 0.4286 Epoch 6/15 18272/18272 [==============================] - 117s - loss: 0.0314 - acc: 0.4297 - val_loss: 0.0281 - val_acc: 0.4286 Epoch 7/15 18272/18272 [==============================] - 115s - loss: 0.0314 - acc: 0.4297 - val_loss: 0.0280 - val_acc: 0.4286 Epoch 8/15 18272/18272 [==============================] - 119s - loss: 0.0314 - acc: 0.4297 - val_loss: 0.0280 - val_acc: 0.4286 Epoch 9/15 18272/18272 [==============================] - 116s - loss: 0.0312 - acc: 0.4297 - val_loss: 0.0280 - val_acc: 0.4286 Epoch 10/15 18272/18272 [==============================] - 116s - loss: 0.0314 - acc: 0.4297 - val_loss: 0.0280 - val_acc: 0.4286 Epoch 11/15 18272/18272 [==============================] - 115s - loss: 0.0313 - acc: 0.4297 - val_loss: 0.0280 - val_acc: 0.4286 Epoch 12/15 18272/18272 [==============================] - 113s - loss: 0.0312 - acc: 0.4297 - val_loss: 0.0280 - val_acc: 0.4286 Epoch 13/15 18272/18272 [==============================] - 113s - loss: 0.0312 - acc: 0.4297 - val_loss: 0.0279 - val_acc: 0.4286 Epoch 14/15 18272/18272 [==============================] - 113s - loss: 0.0312 - acc: 0.4297 - val_loss: 0.0280 - val_acc: 0.4286 Epoch 15/15 18272/18272 [==============================] - 114s - loss: 0.0312 - acc: 0.4297 - val_loss: 0.0280 - val_acc: 0.4286

how does this come out?

ymcui commented 8 years ago

@talentlei In your log, the loss seems to start from a very low value, and converge very soon after a few epochs. I've no particular idea about this, but I think you should check validity of your data. (and maybe remove batch_input_shape attribute in your LSTM layer, i guess.)

lqj1990 commented 8 years ago

@talentlei Have solved the problem？ I stuck in the same situation when I use RNN， but I don't know how to solve it.

DSA101 commented 8 years ago

I have a similar problem. In my case when I attempt LSTM time series classification often val_acc starts with a high value and stays the same, even though loss, val_loss and acc change. I've narrowed down the issue to not enough training sequences (around 300). When I increased the number to 500+, it started to converge better, but still there are periods when loss, acc and val_loss changes, but val_acc sticks to the same value. How could that be? Is there a bug when it's not updating (even though loss, acc and val_loss update during the same epoch)?

model = Sequential() model.add(LSTM(256, input_shape=(6, 10))) model.add(Dropout(0.5)) model.add(Dense(1)) model.add(Activation('sigmoid')) model.compile(loss='binary_crossentropy', optimizer='rmsprop', metrics=['accuracy']) hist = model.fit(X_train_mat, Y_train_mat, nb_epoch=10000, batch_size=30, validation_split=0.1)

Epoch 2816/10000 50/472 [==>...........................] - ETA: 0s - loss: 0.6281 - acc: 0.6800Epoch 02815: val_acc did not improve 472/472 [==============================] - 0s - loss: 0.5151 - acc: 0.7648 - val_loss: 1.2978 - val_acc: 0.4151 Epoch 2817/10000 50/472 [==>...........................] - ETA: 0s - loss: 0.4406 - acc: 0.8600Epoch 02816: val_acc did not improve 472/472 [==============================] - 0s - loss: 0.5179 - acc: 0.7479 - val_loss: 1.2844 - val_acc: 0.4151 Epoch 2818/10000 50/472 [==>...........................] - ETA: 0s - loss: 0.5385 - acc: 0.7400Epoch 02817: val_acc did not improve 472/472 [==============================] - 0s - loss: 0.5100 - acc: 0.7585 - val_loss: 1.2699 - val_acc: 0.4151

braingineer commented 8 years ago

A good method for debugging this issue is to use an ipython/jupyter notebook, compile the model, and then have it predict for one of your batches. Then, go through the accuracy code with the ability to manually inspect the values of the matrices. I've found stepping through code like this in mysterious situations to be enlightening.

ersinyar commented 7 years ago

@DSA101 Have you solved the problem? I am doing sentence classification task with variable sentence lengths using LSTMs. My problem is that training loss and training accuracy decrease over epochs but validation accuracy fluctuates in a small interval. Maybe your solution could be helpful for me too.

DSA101 commented 7 years ago

My solution was to increase the size of the training set, reduce the number of features, start with just one layer and not too many units (say 128). When I ensured that in such configuration the training progresses in a reasonable way, I have slowly added more features, more units, etc, and in the end got a satisfactory result. Still if I make the model overly complex (e.g. increase to 3 layers with say 512 units without providing more training data), it would behave the same as before - flat or irregular training accuracy.

In the end I don't know if there is still a bug in the framework, or it all results from an overly complicated model and the insufficient size of the training set, but all things considered, I am satisfied with the performance of the model and the results that I have achieved and believe that Keras LSTM is usable for time series classification.

So if your training acc improves but validation accuracy stays in a small interval, can it be indicative of overfitting?

maxpagels commented 7 years ago

I'm having the same issue. Loss and accuracy on the training set change from epoch to epoch, but the validation accuracy / loss doesn't, which is a bit odd.

Epoch 1/20
158/158 [==============================] - 24s - loss: 2.3558 - acc: 0.4051 - val_loss: 1.0986 - val_acc: 0.3684
Epoch 2/20
158/158 [==============================] - 24s - loss: 1.8001 - acc: 0.3924 - val_loss: 1.0986 - val_acc: 0.3684
Epoch 3/20
158/158 [==============================] - 24s - loss: 1.2940 - acc: 0.3608 - val_loss: 1.0986 - val_acc: 0.3684
Epoch 4/20
158/158 [==============================] - 24s - loss: 1.8052 - acc: 0.4114 - val_loss: 1.0986 - val_acc: 0.3684
Epoch 5/20
158/158 [==============================] - 24s - loss: 1.7127 - acc: 0.3734 - val_loss: 1.0986 - val_acc: 0.3684
Epoch 6/20
158/158 [==============================] - 24s - loss: 1.8030 - acc: 0.3734 - val_loss: 1.0986 - val_acc: 0.3684
Epoch 7/20
158/158 [==============================] - 24s - loss: 1.7076 - acc: 0.3861 - val_loss: 1.0986 - val_acc: 0.3684
Epoch 8/20
158/158 [==============================] - 24s - loss: 1.4173 - acc: 0.4241 - val_loss: 1.0986 - val_acc: 0.3684
Epoch 9/20
158/158 [==============================] - 24s - loss: 1.3042 - acc: 0.3797 - val_loss: 1.0986 - val_acc: 0.3684

The model I'm using is a convnet:

model = Sequential()
model.add(Convolution2D(20, 5, 5, input_shape=(3, img_width, img_height)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))

model.add(Flatten())
model.add(Dense(20))
model.add(Activation('relu'))
model.add(Dropout(0.2))
model.add(Dense(3))
model.add(Activation('sigmoid'))

sgd = SGD(lr=0.0005)
model.compile(loss='categorical_crossentropy',
              optimizer=sgd,
              metrics=['accuracy'])

savourylie commented 7 years ago

Similar problem here. It really feels like a bug to me. The reason is that my validation set has 2500+ observations for a dataset of size like this, as long as there's change in the weights (and theres is since the training error is decreasing), there should be change in the val_loss, either positive or negative. Also it's unlikely it's overfitting as I'm really using heavy dropouts (between 0.5~0.7 for each layer).

My solution to this is changing the learning rate of the optimizer....sometimes it helps, haha. I've never experienced the same phenomenon using raw tensorflow so I think it's a keras thing.

andrew-ayers commented 7 years ago

I'm gunna throw my voice in here, too. I'm currently doing the Udacity Self-Driving Car Engineer Nanodegree course; my cohort is currently doing the behavioral cloning lab. We were given a dataset of approximately 20k+ features and labels; I take it and augment it with flipping - so I have about 40k of data. My convnet is the same one from the NVidia end-to-end paper (relu on all layers). I am using adam and mse for optimizer/loss. I've tried heavy dropout on the fully-connected layers, on all layers, on random layers. Ultimately, my validation accuracy stays stuck at a single value. I'd think if I were overfitting, the accuracy would peg close or at 100%? Rather, it seems like it is getting stuck in a local minima. I think I'm going to need to do some visualization of the data, to verify that it is balanced, plus I have some other ideas to try, but so far it is very frustrating. I don't know if it is a bug with the framework; my best guess is that it is not, because other students are finding success.

dvillevald commented 7 years ago

@andrew-ayers Did you manage to solve this issue? I have a similar problem with NVIDIA (adam, mse, 120k samples including flipped data) model for Self_Driving Car Engineer course - validation loss changes but validation accuracy stays the same.

msmah commented 6 years ago

I had the same problem while training a convolutional auto encoder. I made learning rate ("lr" parameter in optimizer) smaller and it solved the problem.

hujiao1314 commented 6 years ago

Have you solved the problem? I met a similar problem with my keras CNN model, my training samples were 4000, and validation samples were 1000. During the training process, the loss and val_loss was decreasing, but the acc and val_acc never changing during this process.

this is my code:

'inputs_x=Input(shape=(1,65,21)) x=Conv2D(64,(3,3),padding='same',data_format='channels_first',activation='relu',use_bias=True)(x) x=Conv2D(64,(3,3),padding='same',data_format='channels_first',activation='relu',use_bias=True)(x) x=MaxPooling2D(pool_size=(2,2),strides=(2,2))(x)

x=Conv2D(32,(5,5),padding='same',data_format='channels_first',activation='relu',use_bias=True)(x) x=Conv2D(16,(5,5),padding='valid',data_format='channels_first',activation='relu',use_bias=True)(x) x=MaxPooling2D(pool_size=(2,2),strides=(2,2))(x)

x=Dropout(0.25)(x) x=Flatten()(x)

inputs_y=Input(shape=(1,32,21)) y=Conv2D(32,(2,2),padding='same',data_format='channels_first',activation='relu',use_bias=True)(y) y=Conv2D(32,(2,2),padding='same',data_format='channels_first',activation='relu',use_bias=True)(y) y=MaxPooling2D(pool_size=(2,2),strides=(2,2))(y)

y=Conv2D(32,(4,4),padding='same',data_format='channels_first',activation='relu',use_bias=True)(y) y=Conv2D(8,(4,4),padding='valid',data_format='channels_first',activation='relu',use_bias=True)(y) y=MaxPooling2D(pool_size=(2,2),strides=(2,2))(y)

y=Dropout(0.30)(y) y=Flatten()(y)

merged_input=keras.layers.concatenate([x,y],axis=-1)

z=Dense(16,activation='softmax')(merged_input) z=Dense(8,activation='softmax')(z) z=Dense(4,activation='softmax')(z)

outp=Dense(1,activation='softmax')(z)

model=Model(inputs=[inputs_x,inputs_y],outputs=outp) model.compile(loss='binary_crossentropy', optimizer='sgd', metrics=['accuracy'])

history=model.fit(x=[train_inputs_x,train_inputs_y],y=train_label,batch_size=32, epochs=30,validation_split=0.2,shuffle=True)`

any ideas for this?

hadisaadat commented 6 years ago

Does anyone know how to solve this issues? in my model, by LSTM I have got repeating training and validation accuracy for each epoch!! the model learns slightly within the epoch and after each batch, but seems it reset before next epoch and start again from the beginning! its the training log after epochs:

4s - loss: 0.2217 - acc: 0.6464 - val_loss: 0.1487 - val_acc: 0.8137
3s - loss: 0.2217 - acc: 0.6469 - val_loss: 0.1487 - val_acc: 0.8137
3s - loss: 0.2217 - acc: 0.6469 - val_loss: 0.1487 - val_acc: 0.8137
3s - loss: 0.2216 - acc: 0.6469 - val_loss: 0.1487 - val_acc: 0.8137
4s - loss: 0.2216 - acc: 0.6469 - val_loss: 0.1487 - val_acc: 0.8137
4s - loss: 0.2216 - acc: 0.6469 - val_loss: 0.1487 - val_acc: 0.8137
4s - loss: 0.2216 - acc: 0.6469 - val_loss: 0.1487 - val_acc: 0.8137
4s - loss: 0.2216 - acc: 0.6469 - val_loss: 0.1487 - val_acc: 0.8137
4s - loss: 0.2216 - acc: 0.6469 - val_loss: 0.1487 - val_acc: 0.8137
4s - loss: 0.2216 - acc: 0.6469 - val_loss: 0.1487 - val_acc: 0.8137
4s - loss: 0.2216 - acc: 0.6469 - val_loss: 0.1487 - val_acc: 0.8137
4s - loss: 0.2216 - acc: 0.6469 - val_loss: 0.1487 - val_acc: 0.8137
4s - loss: 0.2216 - acc: 0.6469 - val_loss: 0.1487 - val_acc: 0.8137
4s - loss: 0.2216 - acc: 0.6469 - val_loss: 0.1487 - val_acc: 0.8137
4s - loss: 0.2216 - acc: 0.6469 - val_loss: 0.1487 - val_acc: 0.8137
4s - loss: 0.2216 - acc: 0.6469 - val_loss: 0.1487 - val_acc: 0.8137
4s - loss: 0.2216 - acc: 0.6469 - val_loss: 0.1487 - val_acc: 0.8137 ... it's the log after batches: Train on 21000 samples, validate on 9000 samples ==================New Training Start==================== Epoch 1/50 batch: 0 ended Loss: 0.21712732 Accuracy: 0.625 batch: 1 ended Loss: 0.229398 Accuracy: 0.65166664 batch: 2 ended Loss: 0.22204755 Accuracy: 0.6383333 batch: 3 ended Loss: 0.21405634 Accuracy: 0.6533333 batch: 4 ended Loss: 0.21910276 Accuracy: 0.63 batch: 5 ended Loss: 0.22354788 Accuracy: 0.70166665 batch: 6 ended Loss: 0.23390895 Accuracy: 0.62166667 batch: 7 ended Loss: 0.21102294 Accuracy: 0.62833333 batch: 8 ended Loss: 0.22611171 Accuracy: 0.66833335 batch: 9 ended Loss: 0.21904916 Accuracy: 0.62 batch: 10 ended Loss: 0.23376058 Accuracy: 0.645 batch: 11 ended Loss: 0.21929795 Accuracy: 0.6766667 batch: 12 ended Loss: 0.22111656 Accuracy: 0.6483333 batch: 13 ended Loss: 0.2131401 Accuracy: 0.65 batch: 14 ended Loss: 0.2148913 Accuracy: 0.6566667 batch: 15 ended Loss: 0.22052963 Accuracy: 0.635 batch: 16 ended Loss: 0.22950262 Accuracy: 0.6333333 batch: 17 ended Loss: 0.22890009 Accuracy: 0.64666665 batch: 18 ended Loss: 0.22269897 Accuracy: 0.65166664 batch: 19 ended Loss: 0.22959195 Accuracy: 0.645 batch: 20 ended Loss: 0.22551142 Accuracy: 0.6566667 batch: 21 ended Loss: 0.2217158 Accuracy: 0.635 batch: 22 ended Loss: 0.21928492 Accuracy: 0.64 batch: 23 ended Loss: 0.21457554 Accuracy: 0.66333336 batch: 24 ended Loss: 0.22461174 Accuracy: 0.655 batch: 25 ended Loss: 0.21772751 Accuracy: 0.665 batch: 26 ended Loss: 0.21689837 Accuracy: 0.63166666 batch: 27 ended Loss: 0.22468112 Accuracy: 0.6333333 batch: 28 ended Loss: 0.2141714 Accuracy: 0.6533333 batch: 29 ended Loss: 0.22494899 Accuracy: 0.6483333 batch: 30 ended Loss: 0.22441803 Accuracy: 0.62833333 batch: 31 ended Loss: 0.22385867 Accuracy: 0.62666667 batch: 32 ended Loss: 0.2221946 Accuracy: 0.66 batch: 33 ended Loss: 0.2230069 Accuracy: 0.64166665 batch: 34 ended Loss: 0.21400177 Accuracy: 0.66

ManuConcepBrito commented 6 years ago

@hujiao1314 I do not know if I really understand what you are trying to do, so forgive me if it does not make sense. My observations: In your last layer outp, you are using softmax when you only have one output neuron. You might find it useful to change to 'sigmoid'. Again, for the layers named z, they do not seem to be a final output and you are using a softmax activation function. It is quite a bit confusing so if you could specify the characteristics of your problem I could be more helpful.

AkhilAshref commented 6 years ago

@hadisaadat reduce ur learning rate and try for a few smaller learning rates. SHould solve ur problem

ghost commented 6 years ago

@AkhilAshref , even i had the similar issue as @hadisaadat , mine worked after reducing the lr. But could you give a bit more detailed explanation as to why the gradient becomes zero. Thanks

amcneil1998 commented 6 years ago

@vishnu-zsf I'm having the same problem it seems, what optimizer/ learning rate did you use?

ghost commented 6 years ago

@amcneil1998 , i used adam optimizer and settled on a learning rate of 0.0008 , . This was when i used 100,000 data samples and had 10 epochs. But later on when i tried to run with 30 epochs , i shifted to decaying learning rate, which after tuning for a while gave me satisfactory results. I'm pretty sure that the learning rate and all the parameters in the optimizer vary with the kind of data we have and the sheer magnitude of the features.

<initial code when i ran with 10 epochs > . opt = optimizers.adam(lr=0.0008) self.model.compile(loss='binary_crossentropy', optimizer=opt,metrics = ['accuracy'])
. code to run with decaying lr in Keras .

keras.optimizers.Adam(lr=0.001, beta_1=0.9, beta_2=0.999, epsilon=None, decay=0.0, amsgrad=False) . . Do reply if you the issue still persists.

amcneil1998 commented 6 years ago

@vishnu-zsf still having the issue. I have tried reducing the learning rate, increasing the learning rate, tried both sdg and adam optimizers. I have event tried to over fit my data by just using a small part of my data. I currently have 900 data points, of which I am using 100 for both test and validation, and 700 for training. I have tried increasing my amount of data to 2800, using 400 for both test and validation, and 2000 for training. I'm currently using a batch size of 50, and even running past 50 epochs showed no increase in accuracy or loss. I noticed later on while trying to predict results that my predictions were heading towards 0, with them coming closer the longer I trained. This seems to be the case really no matter what I do.

hadisaadat commented 6 years ago

@vishnu-zsf @amcneil1998 in my case, the lr had no impact actually and the solution for me was shuffling data for each epoch. depends on your data nature [time series or not] you should select a convenient cross-validation and shuffling strategy.

amcneil1998 commented 6 years ago

@hadisaadat setting shuffle=true did not improve my results. Accuracy still stayed around 0.5 but loss started pretty low (0.01). So I increased the learning rate and loss started around 5.1 and then dropped of to 0.02 after the 6th Epoch. Accuracy started at 0.5 and averaged around that on both training and validation data for the 120 epochs that I trained. However when predicting I am only able to get 2 values from the output.

ghost commented 6 years ago

@amcneil1998 you may have to regularize and can even use the Earlystopping in callbacks, but before that could you share your code and ur data ( 5 sample points would do) , coz like i said the methods we use pretty much depend on the type of data we use. Mine is all resolved now btw

amcneil1998 commented 6 years ago

@vishnu-zsf All of my input/output data is regularized from -1-1 with a mean of 0. The input data is a 3d array with the form (Nsamples, Entries/Sample, EntryDim). In this case it is (900, 225, 6). The output data is a 2d array with shape (Nsamples, 2), so in this case it is (900,2). Some of the samples did not have enough entries so they are zero-padded to the correct size. Here is the code for the model after the test data has been split off:

initilizer = RandomNormal(mean=0.0, stddev=0.05, seed=None)
modelInput = Input(batch_shape=(batch_size, 225, 6), name="Model_Input")
mid = LSTM(128, return_sequences=True, input_dim= (225, 6), bias_initializer=initilizer)(modelInput)
mid = LSTM(128, return_sequences=False, bias_initializer=initilizer)(mid)
output = Dense(2, activation='linear')(mid)
model = Model(inputs = modelInput, outputs = output)
adam = optimizers.Adam(lr = 0.000000001)
model.compile(loss='mean_squared_error', optimizer = adam, metrics=['accuracy'])
model.fit(trainInputData, trainTruthData, epochs=20, batch_size=batch_size, verbose=2, validation_split=(1/8), shuffle=True)

AniketDhar commented 5 years ago

I have faced the same issue multiple times while using Keras. I have tried data normalization, shuffling, different learning rates and different optimizers. Nothing seems to help out, except increasing the data size. Now that is a problem for me, as I am trying to compare the effect of the data sample size on my network. I see a lot of problems but rarely any solution in the discussions above. If anyone has a decent solution except sample size, kindly let me know.

Timlo512 commented 5 years ago

I used to face the same result before. I found that using smaller neural network architecture. Reason behind should be due to vanishing gradient. In some situation, your input might not carry as much information as the neural network expects, and therefore, the weights are gonna vanish to zeros even after several layers. Such problem is more serious when you are doing ConvNet, and it's the reason why we got residual network. Hope this help.

gerdis commented 5 years ago

I faced the same issue when trying to implement a CNN for a multi-label comment classification problem. When I trained my model on a tiny subset of my data (say, 100 of 100000), it did what I had expected with a highly imbalanced data set - loss decreasing, accuracy going up to 1 very quickly, validation accuracy a little lower, but also changing pretty fast. But when I used a bigger subset, my model seemed to get 'stuck' at a certain accuracy. It also predicted 'nan' for all test samples and labels.

What finally solved the problem for me was applying (sigmoid) activation to my 1D convolutional layer (there was only one in my model at that point). I hadn't used any activation at first. I also set kernel_initializer to random_normal, but I think the crucial part was the activation.

Everything else I had tried before that (different activation on the other layers, manipulating the learning rate, gradient clipping, balancing out my dataset) didn't make any difference.

Looking at the other examples I see nobody made the same mistake - using a convolutional layer without activation - but who knows, maybe my experience could be helpful for someone in the future.

BahadirGLCK commented 5 years ago

I 'm not sure but I solved this problem. I used Keras for CNN model on the Kaggle platform with GPU.
I took the same problems all epoch step had same val_loss and val_acc. Like:
Epoch 2/50 - val_loss: 0.6931 - val_acc:0.5521 Epoch 3/50 - val_loss: 0.6931 -val_acc: 0.5521 ... When I changed optimization methods from Adam to RMSprop, it was run but I refreshed all kernel and restart I took the same issue. I changed again RMSprop to SGD. It had worked. Sometimes the problem is caused by a unsuitable Dense layers.

sayedathar11 commented 5 years ago

For Those who still have this problem and wondering why this occurs. The reason is pretty straightforward in your final Dense layers where you are specifying the output basically the softmax layer , here number of cells should be equal to number of classes. If you are solving Binary Classification all you need to do this use 1 cell with sigmoid activation.

for Binary

model.add(Dense(1,activation='sigmoid'))

for n_class

model.add(Dense(n_class,activation='softmax')) #where n_class is number of classes Thanks to :https://stackoverflow.com/questions/51581521/accuracy-stuck-at-50-keras

sowmy19 commented 5 years ago

@sayedathar11 When I use model.add(Dense(1,activation='sigmoid')), am getting the following error. ValueError: Error when checking target: expected dense_4 to have shape (1,) but got array with shape (2,)

Here is my code:

batch_size = 32 nb_classes = 2 data_augmentation = True

img_rows, img_cols = 224,224 img_channels = 3

Creating array of training samples

train_path = "D:/data/train*.*" training_data=[] for file in glob.glob(train_path): print(file) train_array= cv2.imread(file) train_array=cv2.resize(train_array,(img_rows,img_cols),3) training_data.append(train_array)

x_train=np.array(training_data)

Creating array of validation samples

valid_path = "D:/data/valid*.*" valid_data=[] for file in glob.glob(valid_path): print(file) valid_array= cv2.imread(file) valid_array=cv2.resize(valid_array,(img_rows,img_cols),3) valid_data.append(train_array)

x_valid=np.array(valid_data)

x_train = np.array(x_train, dtype="float")/255.0 x_valid = np.array(x_valid, dtype="float")/255.0

Creating array for Labels

y_train=np.ones((num_trainsamples,),dtype = int) y_train[0:224]=0 #Class1=0 y_train[225:363]=1 #Class2=1 print(y_train)

y_valid=np.ones((num_validsamples,),dtype = int) y_valid[0:101]=0 y_valid[102:155]=1 print(y_valid)

y_train = np_utils.to_categorical(y_train,nb_classes,dtype='int32') y_valid = np_utils.to_categorical(y_valid,nb_classes,dtype='int32')

base_model=ResNet50(weights='imagenet',include_top=False)

x = base_model.output x = GlobalMaxPooling2D()(x) x=Dense(1024,activation='relu')(x) x=Dense(1024,activation='relu')(x) x=Dense(512,activation='relu')(x) x=Dense(1, activation= 'sigmoid')(x) model = Model(inputs = base_model.input, outputs = x)

for i,layer in enumerate(model.layers): print(i,layer.name)

for layer in model.layers[:75]: layer.trainable=False for layer in model.layers[75:]: layer.trainable=True

adam = Adam(lr=0.0001) model.compile(optimizer= adam, loss='binary_crossentropy', metrics=['accuracy'])

train_datagen = ImageDataGenerator( brightness_range=(0.2,2.5), rotation_range=180, zoom_range=0.5, width_shift_range=0.2, height_shift_range=0.2, horizontal_flip=True, vertical_flip=True)

train_datagen.fit(x_train)

history= model.fit_generator(train_datagen.flow(x_train, y_train, batch_size = 10,shuffle=True),steps_per_epoch=len(x_train),epochs = 500,shuffle=True, validation_data=(x_valid,y_valid),validation_steps=num_validsamples // batch_size,callbacks=[tensorboard])

eval = model.evaluate(x_valid, y_valid) print ("Loss = " + str(eval[0])) print ("Test Accuracy = " + str(eval[1]))

predictions= model.predict(x_valid) print(predictions)

skhadem commented 5 years ago

I had same issue: epoch accuracy was growing while validation was the same value (0.41). But, I saved the weights after an epoch and then when I loaded the weights and continued training, everything worked.

First time: create the model, compile, call fit_generator: bad validation results every epoch. Then: create the model, compile, load weights, call fit_generator: everything works beautifully.

To me it seems like I missed a step, but when calling load_weights on the model it was corrected

abhineet99 commented 5 years ago

Had the same issue. Reducing Initial Learning Rate helps.

prabaHridayami commented 5 years ago

hey, I'm new at deep learning especially CNN. I've been trying to train 100 class with 10 images for each class. I've been using many kinds of architecture but the val_loss really high and val_acc really low. Do you guys have any suggestion for that?

skhadem commented 5 years ago

@prabaHridayami That is very low amount of data, it can be hard to obtain good results. Are you doing any type of data augmentation? That would be my suggestion to increase the variety of data your model sees.

prabaHridayami commented 5 years ago

@skhadem yeah, i'm doing several augmentations so 1 image is going to be having 88 image augmentation. i'm currently trying to train 10 class with val_acc is 0.6870 and val_loss is 1.4573. what do you think?

skhadem commented 5 years ago

@prabaHridayami what architecture are you using?

prabaHridayami commented 5 years ago

@prabaHridayami what architecture are you using?

model = Sequential()

model.add(Conv2D(32, (3, 3), input_shape=(100, 400, 3), activation='relu', padding='same',name='block1_conv1')) model.add(Conv2D(32, (3, 3), activation='relu',padding='same',name='block1_conv2')) model.add(Conv2D(32, (3, 3), activation='relu',padding='same',name='block1_conv3')) model.add(Conv2D(32, (3, 3), activation='relu',padding='same',name='block1_conv4')) model.add(MaxPooling2D(pool_size=(2, 2),name='block1_pool'))

model.add(Conv2D(64, (3, 3), activation='relu',padding='same',name='block2_conv1')) model.add(Conv2D(64, (3, 3), activation='relu',padding='same',name='block2_conv2')) model.add(Conv2D(64, (3, 3), activation='relu',padding='same',name='block2_conv3')) model.add(MaxPooling2D(pool_size=(2, 2),name='block2_pool'))

model.add(Conv2D(128, (3, 3), activation='relu',padding='same',name='block3_conv1')) model.add(Conv2D(128, (3, 3), activation='relu',padding='same',name='block3_conv2')) model.add(Conv2D(128, (3, 3), activation='relu',padding='same',name='block3_conv3')) model.add(MaxPooling2D(pool_size=(2, 2),name='block3_pool'))

model.add(Conv2D(256, (3, 3), activation='relu',padding='same',name='block4_conv1')) model.add(Conv2D(256, (3, 3), activation='relu',padding='same',name='block4_conv2')) model.add(MaxPooling2D(pool_size=(2, 2),strides =(2,2),name='block4pool'))

model.add(Flatten()) model.add(Dense(256, activation='relu')) model.add(Dropout(0.4))

model.add(Dense(256, activation='relu')) model.add(Dropout(0.4))

model.add(Dense(20, activation='softmax'))

this is my architecture model using sequential

skhadem commented 5 years ago

@prabaHridayami I would recommend using a pre trained and well studied architecture for feature extraction and then fine tuning the layers on the top. My personal go-to is VGG19. In keras you can do something like this:

base = keras.applications.VGG19(input_shape=(100,400,3), 
                                include_top=False, 
                                input_size=(100,400,3),
                                weights='imagenet',
                                pooling='max')
# freeze base layers
for layer in base.layers:
    layer.trainable=False

model = keras.Sequential()
model.add(base)
# you should experiment with different top level designs 
model.add(Dense(1024, activation='relu'))
model.add(Dropout(0.4))
model.add(Dense(512, activation='relu'))
model.add(Dropout(0.4))
model.add(Dense(20, activation='softmax'))

check out https://blog.keras.io/building-powerful-image-classification-models-using-very-little-data.html

prabaHridayami commented 5 years ago

@prabaHridayami I would recommend using a pre trained and well studied architecture for feature extraction and then fine tuning the layers on the top. My personal go-to is VGG19. In keras you can do something like this:
base = keras.applications.VGG19(input_shape=(100,400,3), 
                                include_top=False, 
                                input_size=(100,400,3),
                                weights='imagenet',
                                pooling='max')
# freeze base layers
for layer in base.layers:
    layer.trainable=False

model = keras.Sequential()
model.add(base)
# you should experiment with different top level designs 
model.add(Dense(1024, activation='relu'))
model.add(Dropout(0.4))
model.add(Dense(512, activation='relu'))
model.add(Dropout(0.4))
model.add(Dense(20, activation='softmax'))
check out https://blog.keras.io/building-powerful-image-classification-models-using-very-little-data.html

thank you very much, i'll check that out... i don't really understand about dense and dropout. do you know what is the function of these two?

skhadem commented 5 years ago

So Dense is just a fully connected layer, it is what does a lot of the "decision making" based on the resulting feature vector. It's a way to take large feature vectors and map to a class. The more you have the more "flexible" it can be, i.e. learn better, but that means more parameters. Dropout literally takes random weights and drops them by setting them to 0. The way I think about it is that if there are certain sections that are contributing a lot to a correct result, the optimizer could ignore everything else. With Dropout the optimizer is forced to focus on many different places. It helps to avoid over fitting and is almost standard at this point.

prabaHridayami commented 5 years ago

So Dense is just a fully connected layer, it is what does a lot of the "decision making" based on the resulting feature vector. It's a way to take large feature vectors and map to a class. The more you have the more "flexible" it can be, i.e. learn better, but that means more parameters. Dropout literally takes random weights and drops them by setting them to 0. The way I think about it is that if there are certain sections that are contributing a lot to a correct result, the optimizer could ignore everything else. With Dropout the optimizer is forced to focus on many different places. It helps to avoid over fitting and is almost standard at this point.

thank you very much... now i understand

MukundGK1986 commented 5 years ago

@sayedathar11 When I use model.add(Dense(1,activation='sigmoid')), am getting the following error. ValueError: Error when checking target: expected dense_4 to have shape (1,) but got array with shape (2,)

Here is my code:

batch_size = 32 nb_classes = 2 data_augmentation = True

img_rows, img_cols = 224,224 img_channels = 3

Creating array of training samples

train_path = "D:/data/train." training_data=[] for file in glob.glob(train_path): print(file) train_array= cv2.imread(file) train_array=cv2.resize(train_array,(img_rows,img_cols),3) training_data.append(train_array)

x_train=np.array(training_data)

Creating array of validation samples

valid_path = "D:/data/valid." valid_data=[] for file in glob.glob(valid_path): print(file) valid_array= cv2.imread(file) valid_array=cv2.resize(valid_array,(img_rows,img_cols),3) valid_data.append(train_array)

x_valid=np.array(valid_data)

x_train = np.array(x_train, dtype="float")/255.0 x_valid = np.array(x_valid, dtype="float")/255.0

Creating array for Labels

y_train=np.ones((num_trainsamples,),dtype = int) y_train[0:224]=0 #Class1=0 y_train[225:363]=1 #Class2=1 print(y_train)

y_valid=np.ones((num_validsamples,),dtype = int) y_valid[0:101]=0 y_valid[102:155]=1 print(y_valid)

y_train = np_utils.to_categorical(y_train,nb_classes,dtype='int32') y_valid = np_utils.to_categorical(y_valid,nb_classes,dtype='int32')

base_model=ResNet50(weights='imagenet',include_top=False)

x = base_model.output x = GlobalMaxPooling2D()(x) x=Dense(1024,activation='relu')(x) x=Dense(1024,activation='relu')(x) x=Dense(512,activation='relu')(x) x=Dense(1, activation= 'sigmoid')(x) model = Model(inputs = base_model.input, outputs = x)

for i,layer in enumerate(model.layers): print(i,layer.name)

for layer in model.layers[:75]: layer.trainable=False for layer in model.layers[75:]: layer.trainable=True

adam = Adam(lr=0.0001) model.compile(optimizer= adam, loss='binary_crossentropy', metrics=['accuracy'])

train_datagen = ImageDataGenerator( brightness_range=(0.2,2.5), rotation_range=180, zoom_range=0.5, width_shift_range=0.2, height_shift_range=0.2, horizontal_flip=True, vertical_flip=True)

train_datagen.fit(x_train)

history= model.fit_generator(train_datagen.flow(x_train, y_train, batch_size = 10,shuffle=True),steps_per_epoch=len(x_train),epochs = 500,shuffle=True, validation_data=(x_valid,y_valid),validation_steps=num_validsamples // batch_size,callbacks=[tensorboard])

eval = model.evaluate(x_valid, y_valid) print ("Loss = " + str(eval[0])) print ("Test Accuracy = " + str(eval[1]))

predictions= model.predict(x_valid) print(predictions)

I am also facing the exact same issue. If I keep the number of neurons in the output layer and use sigmoid, for each epochs, there is no change in the accuracy. But, if I make a change in the number of layers as mentioned above, same error as you are getting. Were you able to resolve ? In case yes, pls let us know the solution. Thank you in Advance.

DevMetwaly commented 5 years ago

I have a similar issue when i tried to build an autoencoder using LSTM for sequences or CNN for images, the model reaches around 50% accuracy, 2.5 loss then stuck, nothing improving at all. I tried to increase number of nodes, number of layers but with no progress.

After 3 days I tuned the optimizer trying to change learning rate and learning rate decay, and finally everything improved and everything makes sense, trying to increase learning rate decay slightly till the model start to improve without stuck at 50%.

I used Adam optimizer with following parameters Adam(lr=0.001, beta_1=0.9, beta_2=0.999, epsilon=None, decay=0.01, amsgrad=False) Tuning the parameters will change from problem to another of course. Thanks!

Ahmed-Araby commented 4 years ago

this happened when I used
winit = RandomNormal(mean=0.0 , stddev=0.1) for weight initialization in the Dense layers
and it just worked when I removed it and used the default settings !!!!

Pratikdomadiya commented 4 years ago

I had the same problem while training a convolutional auto encoder. I made learning rate ("lr" parameter in optimizer) smaller and it solved the problem.

can you send me your code of optimization of autoencoder. i want to optimize my autoencoder network but i have no idea how to do that. can you please help me .

AdislanSaidov commented 4 years ago

had the same problem, solved by a changing adam optimizer to sgd

amapic commented 4 years ago

I think that the learning rate is the problem. Actually mine was equal to 7 ahah. I wrote 10-3 instead of 1e-3.

Terkea commented 4 years ago

model = keras.Sequential([
    keras.layers.Conv2D(input_shape=(224,224,3),filters=64,kernel_size=(3,3),padding="same", activation="relu"),
    keras.layers.Conv2D(filters=64,kernel_size=(3,3),padding="same", activation="relu"),
    keras.layers.MaxPool2D(pool_size=(2,2),strides=(2,2)),
    keras.layers.Conv2D(filters=128, kernel_size=(3,3), padding="same", activation="relu"),
    keras.layers.Conv2D(filters=128, kernel_size=(3,3), padding="same", activation="relu"),
    keras.layers.MaxPool2D(pool_size=(2,2),strides=(2,2)),
    keras.layers.Conv2D(filters=256, kernel_size=(3,3), padding="same", activation="relu"),
    keras.layers.Conv2D(filters=256, kernel_size=(3,3), padding="same", activation="relu"),
    keras.layers.Conv2D(filters=256, kernel_size=(3,3), padding="same", activation="relu"),
    keras.layers.MaxPool2D(pool_size=(2,2),strides=(2,2)),
    keras.layers.Conv2D(filters=512, kernel_size=(3,3), padding="same", activation="relu"),
    keras.layers.Conv2D(filters=512, kernel_size=(3,3), padding="same", activation="relu"),
    keras.layers.Conv2D(filters=512, kernel_size=(3,3), padding="same", activation="relu"),
    keras.layers.MaxPool2D(pool_size=(2,2),strides=(2,2)),
    keras.layers.Conv2D(filters=512, kernel_size=(3,3), padding="same", activation="relu"),
    keras.layers.Conv2D(filters=512, kernel_size=(3,3), padding="same", activation="relu"),
    keras.layers.Conv2D(filters=512, kernel_size=(3,3), padding="same", activation="relu"),
    keras.layers.MaxPool2D(pool_size=(2,2),strides=(2,2)),
    keras.layers.Flatten(),
    keras.layers.Dense(units=4096,activation="relu"),
#     keras.layers.Dropout(.5),
    keras.layers.Dense(units=4096,activation="relu"),
    keras.layers.Dropout(.5),
    keras.layers.Dense(units=2, activation="sigmoid"),
])

model.compile(optimizer="adam",
            loss="categorical_crossentropy",
            metrics=['accuracy'])

with this architecture, I get 0.73 constantly. couldn't find a fix yet

kodon0 commented 3 years ago

reducing batch size solved it for me :)

I guess my test set was too small to feed large batches into the CNN.

i hope this may be of use!

keras-team / keras

acc and val_acc don't change? #1597

model.add(LSTM(output_dim=16,return_sequences=True,stateful=False ))

for Binary

for n_class

Creating array of training samples

Creating array of validation samples

Creating array for Labels

Creating array of training samples

Creating array of validation samples

Creating array for Labels