Every data point classified as one specific class by both RNN and CNN

mraduldubey commented 5 years ago

So, I am trying to do a binary classification of videos. I created my own data_file.csv and processed the data in to make the train and test folders. Firstly, I trained the CNN using the given script but the accuracy would remain stuck at ~50%. I verified and found that all data points are being classified as one. So, I skipped that step and decided to use vanilla inception model instead. Turns out the same thing is happening with RNN. Every video sequence gets classified as a single class.

The main difference being that loss kept decreasing in the CNN step. Not so much in the RNN step.

This is a typical RNN training that I run:

Epoch 1/50
Creating validate generator with 30 samples.
Creating train generator with 131 samples.
2019-05-26 16:11:10.312201: I tensorflow/stream_executor/dso_loader.cc:153] successfully opened CUDA library libcublas.so.10.0 locally
26/26 [==============================] - 9s 362ms/step - loss: 4.9882 - acc: 0.6692 - val_loss: 2.6863 - val_acc: 0.8333

Epoch 00001: val_loss improved from inf to 2.68635, saving model to data/checkpoints/lstm-features.001-2.686.hdf5
Epoch 2/50
26/26 [==============================] - 7s 258ms/step - loss: 4.9594 - acc: 0.6923 - val_loss: 2.6863 - val_acc: 0.8333

Epoch 00002: val_loss did not improve from 2.68635
Epoch 3/50
26/26 [==============================] - 7s 258ms/step - loss: 4.9594 - acc: 0.6923 - val_loss: 2.6863 - val_acc: 0.8333

Epoch 00003: val_loss did not improve from 2.68635
Epoch 4/50
26/26 [==============================] - 7s 258ms/step - loss: 4.9594 - acc: 0.6923 - val_loss: 2.6863 - val_acc: 0.8333

Epoch 00004: val_loss did not improve from 2.68635
Epoch 5/50
26/26 [==============================] - 7s 259ms/step - loss: 4.9594 - acc: 0.6923 - val_loss: 2.6863 - val_acc: 0.8333

Epoch 00005: val_loss did not improve from 2.68635
Epoch 6/50
26/26 [==============================] - 7s 258ms/step - loss: 4.9594 - acc: 0.6923 - val_loss: 2.6863 - val_acc: 0.8333

Epoch 00006: val_loss did not improve from 2.68635
done.

Can you point me the direction of a possible diagnosis?

Idodox commented 5 years ago

How did you verify all points are classified as 1?

I've got a similar but different issue of the network not really learning, but I do get different results every epoch, and ~50% accuracy for 3 classes.

mraduldubey commented 5 years ago

In the validation script, I got rid of the existing evaluate_generator, instead I started predicting for individual data points. Verified the results.

mraduldubey commented 5 years ago

Anyways I fixed the issue with lower learning rate and batch normalizations with increased patience.

harvitronix / five-video-classification-methods

Every data point classified as one specific class by both RNN and CNN #130