In training, why the loss is decreasing, while the val loss is increasing?

rykov8 / ssd_keras

Port of Single Shot MultiBox Detector to Keras

MIT License

1.1k stars 553 forks source link

In training, why the loss is decreasing, while the val loss is increasing? #90

Open kitterive opened 7 years ago

kitterive commented 7 years ago

I user VOCtest_06-Nov-2007 dataset , first I user get_data_from_XML.py convert xml ground truth to VOC2007.pkl, and use it to train the network, In the training , I found the loss is decreasing, while the the val loss is increasing , is it overfit? train

meetps commented 7 years ago

I'm observing the same phenomena. Is there some fix to this?

@kitterive - What initial weights are you using? Also how are you normalizing the coordinates?

oarriaga commented 7 years ago

I also observe the same behavior. However, I was able to get a val loss of 1.4 after 20 epochs and afterwards the val loss started increasing.

meetps commented 7 years ago

@oarriaga - In that case, which model weights did you use to finetune it VGG16 ( with top removed) or the caffe converted SSD weights ?

oarriaga commented 7 years ago

I used the pre-trained weights provided in the README file, which I believe are the weights from an older original implementation in caffe.

meetps commented 7 years ago

@oarriaga @rykov8 - Has anyone successfully tried to train the SSD from scratch ( i.e using only VGG16 weights) using this code ? If not then perhaps, it would be wise to rethink the loss function.

MicBA commented 7 years ago

Hi @meetshah1995
try to add BN layer after the Conv.. (the weight wouldn't be the best match but can be good start for train )

Kramins commented 7 years ago

I am seeing the same issue with training off of the MS COCO data set of images.

I was following the training example form SSD_training.ipynb

oarriaga commented 7 years ago

@meetshah1995 I have trained SSD with only the VGG16 weights and it was overfiting after ~20 epochs my lowest validation loss was of 1.4. I believe that better results can be obtained from the correct implementation of therandom_size_crop function in the data augmentation part. Also the architecture ported in the repository is not the newest model from the latest arxiv version and this might lead to significant differences between the implementation here presented and the other ones around such as the TF, pytorch and original caffe one.

ujsyehao commented 6 years ago

Hi, @oarriaga Can you show your training log? I want to know loss after 120k iterations. Thank you in advance!

Hydrogenion commented 5 years ago

I am seeing the same issue while training my own datasets. is it overfit or not?

jamebozo commented 5 years ago

My minimum loss is also around 1.39 ~ 1.4. would adding random_size_crop help?