Re-training SSD-Mobilenet - loss going up and down

Hi,

I am using transfer learning with Re-training SSD-Mobilenet like here. My dataset contains 8000+ images (annotated sport players) (I have grayscale camera so all images are in grayscale (edit: turned into RGB by copying channel)).

EDIT - learning size:

test: 827
train: 5947
trainval: 7434
val: 1488

I used this script to generate test data with:

trainval_percent = 0.9
train_percent = 0.8

I see that until 100 epochs loss is going down but then it is spiking and after exactly 200 epochs reaches new minimum. 1) I am wondering what does it mean (overfitting? or maybe that's just normal optimization)? 2) After each spike there is new minimum (100 - 1.47, 300 - 1.41, 500 - 1.39, 700 - 1.38) - Which one should I use? The lowest (at 700)? or at 100 (because later it may actually be not improving or even breaking)?

I would be glad for some help! Regards

qfgaohao / pytorch-ssd

Re-training SSD-Mobilenet - loss going up and down #172