training on MSCOCO issue with one GTX 1080Ti

liminghuiv commented 4 years ago

Hi Xuannianz,

Thanks for the great work on Keras-EfficientDet. I followed the readme of training on MSCOCO with one GTX 1080 Ti: train STEP1: python3 train.py --snapshot imagenet --phi {0, 1, 2, 3, 4, 5, 6} --gpu 0 --random-transform --compute-val-loss --freeze-backbone --batch-size 32 --steps 1000 pascal|coco datasets/VOC2012|datasets/coco to start training. The init lr is 1e-3. STEP2: python3 train.py --snapshot xxx.h5 --phi {0, 1, 2, 3, 4, 5, 6} --gpu 0 --random-transform --compute-val-loss --freeze-bn --batch-size 4 --steps 10000 pascal|coco datasets/VOC2012|datasets/coco to start training when val mAP can not increase during STEP1. The init lr is 1e-4 and decays to 1e-5 when val mAP keeps dropping down.

Here is what I did: Step 1:

python3 train.py --snapshot imagenet \ --phi 3 \ --gpu 0 \ --random-transform \ --compute-val-loss \ --freeze-backbone \ --batch-size 8 \ --steps 1000 \ --weighted-bifpn \ coco ../mscoco

Here is the result that I got: Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.158 Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.263 Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.166 Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.061 Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.140 Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.243 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.212 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.363 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.393 Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.185 Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.430 Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.561 Epoch 00050: saving model to checkpoints/2020-01-21/coco_50_1.8167_1.7424.h5 1000/1000 [==============================] - 2968s 3s/step - loss: 1.8167 - regression_loss: 1.3582 - classification_loss: 0.4585 - val_loss: 1.7424 - val_regression_loss: 1.3111 - val_classification_loss: 0.4313

Then I started step 2: python3 train.py \ --snapshot ./checkpoints/2020-01-21/coco_50_1.8167_1.7424.h5 \ --phi 3 \ --gpu 0 \ --random-transform \ --compute-val-loss \ --freeze-bn \ --batch-size 2 \ --steps 10000 \ --weighted-bifpn \ coco ../mscoco

The training is still on going, but it seems it is not working well: Epoch 00008: saving model to checkpoints/2020-02-01/coco_08_3.7574_3.7659.h5 10000/10000 [==============================] - 10078s 1s/step - loss: 3.7574 - regression_loss: 2.7862 - classification_loss: 0.9711 - val_loss: 3.7659 - val_regression_loss: 2.7827 - val_classification_loss: 0.9831 Epoch 9/50 COCO evaluation: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5000/5000 [10:57<00:00, 7.88it/s] Loading and preparing results... DONE (t=10.29s) creating index... index created! Running per image evaluation... Evaluate annotation type bbox DONE (t=67.78s). Accumulating evaluation results... DONE (t=24.15s). Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.000 Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.001 Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.000 Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000 Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.000 Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.001 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.019 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.034 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.036 Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000 Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.001 Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.066

Epoch 00009: saving model to checkpoints/2020-02-01/coco_09_3.7721_3.8325.h5

Did I do anything wrong? or set the parameters wrong?

Thanks.

liminghuiv commented 4 years ago

I am following the solution to issue: mAP=0 in step2 #52 to try again. lr=1e-3 ->lr=1e-4. in step 2. Hope it can help.

liminghuiv commented 4 years ago

@xuannianz I noticed that the optimization method (ADAM) is different from the paper: We evaluate EfficientDet on COCO 2017 detection datasets [18]. Each model is trained using SGD optimizer with momentum 0.9 and weight decay 4e-5. Learning rate is first linearly increased from 0 to 0.08 in the initial 5% warm-up training steps and then annealed down using cosine decay rule. Batch normalization is added after every convolution with batch norm decay 0.997 and epsilon 1e-4. We use exponential moving average with decay 0.9998. We also employ commonly-used focal loss [17] with α = 0.25 and γ = 1.5, and aspect ratio {1/2, 1, 2}. Our models are trained with batch size 128 on 32 TPUv3 chips. We use RetinaNet [17] preprocessing for EfficientDet-D0/D1/D3/D4, but for fair comparison, we use the same auto-augmentation for EfficientDet-D5/D6/D7 when comparing with the prior art of AmoebaNet-based NAS-FPN detectors [37].

liminghuiv commented 4 years ago

And I noticed that EfficientNet training method in the paper is different:

We train our EfficientNet models on ImageNet using similar settings as (Tan et al., 2019): RMSProp optimizer with decay 0.9 and momentum 0.9; batch norm momentum 0.99; weight decay 1e-5; initial learning rate 0.256 that decays by 0.97 every 2.4 epochs. We also use swish activation (Ramachandran et al., 2018; Elfwing et al., 2018), fixed AutoAugment policy (Cubuk et al., 2019), and stochastic depth (Huang et al., 2016) with survival probability 0.8. As commonly known that bigger models need more regularization, we linearly increase dropout (Srivastava et al., 2014) ratio from 0.2 for EfficientNet-B0 to 0.5 for EfficientNet-B7.

xuannianz commented 4 years ago

H, @liminghuiv .Yes, the optimizers are different. I use Adam to get faster training.

aindrei commented 4 years ago

I have the same problem. On a custom dataset step 1 works fine but when I try step 2 it doesn't learn at all and it gets to 0 map.

gjy1992 commented 4 years ago

Similar problems with me. The classification loss is hard to decrease and stay at ~0.4 level.

liminghuiv commented 4 years ago

I noticed that keras-retinanet integrated Efficientnet: https://github.com/fizyr/keras-retinanet/search?q=efficientnet&unscoped_q=efficientnet

Anybody tried it?

stale[bot] commented 4 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

xuannianz / EfficientDet

training on MSCOCO issue with one GTX 1080Ti #68