harvitronix / five-video-classification-methods

Code that accompanies my blog post outlining five video classification methods in Keras and TensorFlow
MIT License
1.17k stars 479 forks source link

Every data point classified as one specific class by both RNN and CNN #130

Closed mraduldubey closed 5 years ago

mraduldubey commented 5 years ago

So, I am trying to do a binary classification of videos. I created my own data_file.csv and processed the data in to make the train and test folders. Firstly, I trained the CNN using the given script but the accuracy would remain stuck at ~50%. I verified and found that all data points are being classified as one. So, I skipped that step and decided to use vanilla inception model instead. Turns out the same thing is happening with RNN. Every video sequence gets classified as a single class.

The main difference being that loss kept decreasing in the CNN step. Not so much in the RNN step.

This is a typical RNN training that I run:

Epoch 1/50
Creating validate generator with 30 samples.
Creating train generator with 131 samples.
2019-05-26 16:11:10.312201: I tensorflow/stream_executor/dso_loader.cc:153] successfully opened CUDA library libcublas.so.10.0 locally
26/26 [==============================] - 9s 362ms/step - loss: 4.9882 - acc: 0.6692 - val_loss: 2.6863 - val_acc: 0.8333

Epoch 00001: val_loss improved from inf to 2.68635, saving model to data/checkpoints/lstm-features.001-2.686.hdf5
Epoch 2/50
26/26 [==============================] - 7s 258ms/step - loss: 4.9594 - acc: 0.6923 - val_loss: 2.6863 - val_acc: 0.8333

Epoch 00002: val_loss did not improve from 2.68635
Epoch 3/50
26/26 [==============================] - 7s 258ms/step - loss: 4.9594 - acc: 0.6923 - val_loss: 2.6863 - val_acc: 0.8333

Epoch 00003: val_loss did not improve from 2.68635
Epoch 4/50
26/26 [==============================] - 7s 258ms/step - loss: 4.9594 - acc: 0.6923 - val_loss: 2.6863 - val_acc: 0.8333

Epoch 00004: val_loss did not improve from 2.68635
Epoch 5/50
26/26 [==============================] - 7s 259ms/step - loss: 4.9594 - acc: 0.6923 - val_loss: 2.6863 - val_acc: 0.8333

Epoch 00005: val_loss did not improve from 2.68635
Epoch 6/50
26/26 [==============================] - 7s 258ms/step - loss: 4.9594 - acc: 0.6923 - val_loss: 2.6863 - val_acc: 0.8333

Epoch 00006: val_loss did not improve from 2.68635

Can you point me the direction of a possible diagnosis?

Idodox commented 5 years ago

How did you verify all points are classified as 1?

I've got a similar but different issue of the network not really learning, but I do get different results every epoch, and ~50% accuracy for 3 classes.

mraduldubey commented 5 years ago

In the validation script, I got rid of the existing evaluate_generator, instead I started predicting for individual data points. Verified the results.

mraduldubey commented 5 years ago

Anyways I fixed the issue with lower learning rate and batch normalizations with increased patience.