Fine-tuning with another model

alexjaw commented 2 years ago

Thank's for a great tutorial! I have tried to do a similar thing (with fruits dataset) but with the following models (TF1 OD model zoo) ssdlite_mobilenet_v2_coco_2018_05_09 and ssd_inception_v2_coco_2018_01_28 (also training with TF 1.15.2). However, I have problems getting any descent learning and detection from the training. When running your colab, its remarkable how fast the training converges, already after 1000 steps IoU has increased substantially and loss is around 1. I have run 11000 steps with ssdlite_mobilenet_v2_coco_2018_05_09 without coming near such figures. Another thing that I do not understand is the settings in your pipeline.config file. Normally, for fine_tuning, the config file includes "fine_tune_checkpoint", as well as "fine_tune_checkpoint_type". But your config does not include these params. Isn't that rather non-traditional for a fine-tuning pipeline.config?

my colab

sayakpaul commented 2 years ago

Thanks for reaching out.

But your config does not include these params. Isn't that rather non-traditional for a fine-tuning pipeline.config?

Absoluteyly, it is. But if you see the original config file for the MobileDet variant (CPU) I used in the Colab, you'd notice it too does not expose any such parameters. I should have checked this with the library authors but at the time, I didn't. However, @khanhlvg did review the notebook briefly and everything seemed okay at that point.

Regarding your training not converging, I think it's to do with the dataset and the complexity of the model. As far as I know, the MobileDet variant I used in the notebook is simpler than the SSD_MobileNet variant you are using. MobileDet uses Tucker convs which is likely very suitable for the kind of objects we're dealing with here.

However, if using SSD_MobileNet is a necessity for your project, I would start tweaking the anchor_generator parameter in the config file to find a sweet spot.

Sorry I could not be more helpful.

alexjaw commented 2 years ago

Tested a training round (2000 steps) with your config but added

fine_tune_checkpoint: /model.ckpt-400000
fine_tune_checkpoint_type: "detection"

What I can see the mAP is comparable with the results obtained with the original config

 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.466
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.776
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.457
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.209
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.480
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.372
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.576
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.630
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.625
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.637
...
Loss for final step: 0.9625086

Results for the apple image...a banana... However, it does detect apples on other images in the test dataset.

{'bounding_box': array([0.16773474, 0.21534976, 0.9148209 , 0.7250782 ], dtype=float32), 'class_id': 1.0, 'score': 0.55632144}

A question: Where does the number 117 comes from for num_examples parameter in the config?

mAP results for original config in the tutorial:

 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.489
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.785
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.521
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.355
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.502
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.358
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.611
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.627
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.600
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.633

sayakpaul commented 2 years ago

A question: Where does the number 117 comes from for num_examples parameter in the config?

It's likely the number of evaluation examples as you can see here: https://github.com/sayakpaul/E2E-Object-Detection-in-TFLite/blob/master/colab_training/Fruits_Detection_Data_Prep.ipynb

alexjaw commented 2 years ago

It's likely the number of evaluation examples as you can see here: https://github.com/sayakpaul/E2E-Object-Detection-in-TFLite/blob/master/colab_training/Fruits_Detection_Data_Prep.ipynb

Number of jpg files in the test folder is 60 and 240 in the train folder... (for the dataset from https://www.kaggle.com/mbkinaci/fruit-images-for-object-detection)

sayakpaul / E2E-Object-Detection-in-TFLite

Fine-tuning with another model #3