shabu19 commented 3 years ago

Hi, I'm trying to regenerate results of a Keras Sequential model to train a network for Human Activity Recognition (A two-stream network, comprising of Spatial and Temporal Streams) on UCF-101 Dataset.

generator is used to load in the data

Sequential Stream is working fine and giving reported accuracies.

While working on the temporal stream, the training took too much time with the default Model and the val_acc flactuates too much.

So I tried to change the Dropout from 0.9 to 0.5 in the dense layers and the results are

By changing the dropout from 0.9 to 0.5 the model converged quite quickly but val_acc stagnates at 50% while accuracy keeps incresing, which is the case of overfitting i guess.

accuracy mentioned by the author is 80%.

What should be the dropout value and how to avoid overfitting and increse val_accuracy

`

Default Model

 def fixed_schedule(epoch):
    initial_lr = 1.e-2
    lr = initial_lr

    if epoch == 1389:
        lr = 0.1 * lr
    if epoch == 1944:
        lr = 0.1 * lr

    return lr

def train(num_of_snip=5, opt_flow_len=10, class_limit=None, image_shape=(224, 224), batch_size=32, nb_epoch=100, 
    saved_weights=None):

    cb = callbacks('cp-temporal', 'tb-temporal', 'logs-temporal')

    # Get the data and process it.
     data = DataSet(num_of_snip=num_of_snip, opt_flow_len=opt_flow_len, image_shape=image_shape, 
     class_limit=class_limit)

    steps_per_epoch_train = (len(data.data_list) * 0.7) / batch_size

    generator = data.stack_generator(batch_size, 'train')
    val_generator = data.stack_generator(batch_size, 'test')

    # Get the model.
    temporal_cnn = Sequential()

    # conv1
    model.add(Conv2D(96, (7, 7), strides=2, padding='same', input_shape=self.input_shape))
    model.add(BatchNormalization())
    model.add(Activation('relu'))
    model.add(MaxPooling2D(pool_size=(2, 2)))

    # conv2
    model.add(Conv2D(256, (5, 5), strides=2, padding='same'))
    model.add(Activation('relu'))
    model.add(MaxPooling2D(pool_size=(2, 2)))

    # conv3
    model.add(Conv2D(512, (3, 3), strides=1, activation='relu', padding='same'))

    # conv4
    model.add(Conv2D(512, (3, 3), strides=1, activation='relu', padding='same'))

    # conv5
    model.add(Conv2D(512, (3, 3), strides=1, activation='relu', padding='same'))
    model.add(MaxPooling2D(pool_size=(2, 2)))

    # full6
    model.add(Flatten())
    model.add(Dense(4096, activation='relu'))
    model.add(Dropout(0.9))

    # full7
    model.add(Dense(2048, activation='relu'))
    model.add(Dropout(0.9))

    # softmax
    model.add(Dense(self.nb_classes, activation='softmax'))
    temporal_cnn.model.load_weights(saved_weights)

    # Fit!
    history = temporal_cnn.model.fit(x=generator, steps_per_epoch=steps_per_epoch_train, epochs=nb_epoch, callbacks=cb, 
    validation_data=val_generator, validation_steps=10, initial_epoch=50)
    # analyze the results
    plot_results(history)

def main():
    saved_weights = 'C:/Users/User/Desktop/049.hdf5'
    class_limit = 101            # int, can be 1-101 or None
    num_of_snip = 1             # number of chunks used for each video
    opt_flow_len = 10           # number of optical flow frames used
    image_shape=(224, 224)
    batch_size = 64
    nb_epoch = 500

    train(num_of_snip=num_of_snip, opt_flow_len=opt_flow_len, class_limit=class_limit, image_shape=image_shape, 
    batch_size=batch_size, nb_epoch=nb_epoch, saved_weights=saved_weights)

if __name__ == '__main__':
    main()

saichatla commented 3 years ago

https://jmlr.org/papers/v15/srivastava14a.html

Based on the original paper proposing dropout. Keep dropout between 0.2 < Dropout < 0.5. As you can see when you used 0.9 as your dropout rate, there was over fitting occurring which is not good and is the opposite intent of the use of dropout. When you have 0.5 you have your accuracy oscillating and that might be due two reasons. The first reason being that your learning rate needs is too high and needs to lowered and the other being that your batch size is too small, thus needs to be increased.

shabu19 commented 3 years ago

@saichatla I tried different dropout rates and here are the results. I used 64 as batch size and 0.01 as learning rate

using dropout 0.4

using dropout 0.5

using dropout 0.6

using dropout 0.7

in all these cases the val_accuracy stagates at 50%. I read somewhere that if validation set does not have enough data than this might occur. but we are talking about 3k+ different classes folders with 200+ samples in each folder (UCF-101 Dataset)

Another reason I read for this is that there might not be enough randomness in the validation set, which is also not the case.

What can be the potential reason for that? Thank you.

saichatla commented 3 years ago

Could you rerun these tests by lowering the learning rate to something small like .00001 and increasing the batch size. Thank you!

shabu19 commented 3 years ago

Yes, I have run my model on 0.01 now. Sorry it was 0.1 previously. More over I'm using batch size 128 for 50 classes only, because i cant fix batch size more than 64 for 101 classes due to memory limitation. I will get back to you when I get the results. Thank You!

shabu19 commented 3 years ago

@saichatla I trained my model this time on 25 classes only to get an idea of how it will work for 101 classes. I used learning rate of 0.0001 while keeping batch size of 128 and a dropout of 0.5 after both of the dense layers . The validation accuracy again stagnates at 60%. Capture

sushreebarsa commented 2 years ago

@shabu19 Is this still an issue ! Could you please try with the batch_size=32 and test with latest TF version ? Thanks!

google-ml-butler[bot] commented 2 years ago

This issue has been automatically marked as stale because it has no recent activity. It will be closed if no further activity occurs. Thank you.

google-ml-butler[bot] commented 2 years ago

Closing as stale. Please reopen if you'd like to work on this further.

google-ml-butler[bot] commented 2 years ago

Are you satisfied with the resolution of your issue? Yes No

keras-team / keras

dropout rate in dense layer #14607

Default Model