qfgaohao / pytorch-ssd

MobileNetV1, MobileNetV2, VGG based SSD/SSD-lite implementation in Pytorch 1.0 / Pytorch 0.4. Out-of-box support for retraining on Open Images dataset. ONNX and Caffe2 support. Experiment Ideas like CoordConv.
https://medium.com/@smallfishbigsea/understand-ssd-and-implement-your-own-caa3232cd6ad
MIT License
1.4k stars 533 forks source link

Re-training SSD-Mobilenet - loss going up and down #172

Open Ufosek opened 2 years ago

Ufosek commented 2 years ago

Hi,

I am using transfer learning with Re-training SSD-Mobilenet like here. My dataset contains 8000+ images (annotated sport players) (I have grayscale camera so all images are in grayscale (edit: turned into RGB by copying channel)).

EDIT - learning size:

test: 827
train: 5947
trainval: 7434
val: 1488

I used this script to generate test data with:

trainval_percent = 0.9
train_percent = 0.8

I see that until 100 epochs loss is going down but then it is spiking and after exactly 200 epochs reaches new minimum. 1) I am wondering what does it mean (overfitting? or maybe that's just normal optimization)? 2) After each spike there is new minimum (100 - 1.47, 300 - 1.41, 500 - 1.39, 700 - 1.38) - Which one should I use? The lowest (at 700)? or at 100 (because later it may actually be not improving or even breaking)?

image

I would be glad for some help! Regards

fa0311 commented 1 week ago

Change t-max if you are using the default scheduler. The Cosine Scheduler varies the learning rate like a Cosine Curve.

https://github.com/qfgaohao/pytorch-ssd/blob/7a839cbc8c3fb39679856b4dc42a1ab19ec07581/train_ssd.py#L80