keras-team / keras

Deep Learning for humans
http://keras.io/
Apache License 2.0
61.99k stars 19.48k forks source link

Noisy validation loss in Keras when using fit_generator #8273

Closed ignaciorlando closed 3 years ago

ignaciorlando commented 7 years ago

Any idea about why our training loss is smooth and our validation loss is that noisy across epochs?

image

We are implementing a deep learning model for diabetic retinopathy detection (binary classification) using the data set of fundus photographs provided by this Kaggle competition. We are using Keras 2.0 with Tensorflow backend.

As the data set is too big to fit in memory, we are using fit_generator, with ImageDataGenerator randomly taking images from training and validation folders:

# TRAIN THE MODEL
model.fit_generator(
    train_generator,
    steps_per_epoch= train_generator.samples // training_batch_size,
    epochs=int(config['training']['epochs']),
    validation_data=validation_generator,
    validation_steps= validation_generator.samples // validation_batch_size,
    class_weight=None)

Our CNN architecture is VGG16 with dropout = 0.5 in the last two fully connected layers, batch normalization only before the first fully connected layer, and data augmentation (consisting on flipping the images horizontally and vertically). Our training and validation samples are normalized using the training set mean and standard deviation. Batch size is 32. Our activation is a sigmoid and the loss function is the binary_crossentropy. You can find our implementation in Github

It definitely has nothing to do with overfitting, as we tried with a highly regularized model and the behavior was quite the same. Is it related with the sampling from the validation set? Has any of you had a similar problem before?

Thanks!!

amiasato-zz commented 7 years ago

Are you by any chance training on a Power8 Linux Server? I have the same behaviour with Batch Normalization when training there. In every other environment I tested, the problem did not occur.

NiuCoder commented 5 years ago

I met the same problem and have not idea. Have you figure out?

RayReed1208 commented 4 years ago

I ran into this as well. When running code locally, the validation loss and accuracy both evolved smoothly. When running on Kaggle's server (which is using a different Keras version), I consistently see a very noisy validation curve even though the validation accuracy improves smoothly. I plan to provide more details shortly.