Closed vivian-wong closed 5 years ago
@vivian-wong first of all, congratulations on your analysis, you've done a great job of investigating the performance here in different settings.
I think you've inadvertently discovered the same result I've seen earlier, that transfer learning produces worse results vs normal training. What you are seeing in your second set of plots is that all the layers that were frozen before become unfrozen, and are optimized just as in regular training when you manually --resume
at epoch 273.
There are no LR issues, the LR is behaving as expected in both cases, reducing by 10x at epochs 218 and 245, which roughly correspond to 400k and 500k batches in darknet. This LR scheduler is set for COCO, you probably want to tune it to your custom dataset.
If past experience is correct, you will obtain the best results by simply training normally, and not messing around with transfer learning. By definition, when you freeze layers, they can not adapt to new data, so transfer learning will never produce results as accurate as normal learning.
@vivian-wong I investigated further, with fascinating results. I ran our coco_100img.data
tutorial with and without transfer learning. When running with transfer learning, the wh
losses diverged, so I reduced the wh loss multiplier from 4 to 1 and clamped the prediction outputs to a range of -4 to 4:
lwh += (k * 1) * MSE(pi[..., 2:4].clamp(min=-4, max=4), twh[i]) # wh yolo loss
This resulted now converged (transfer learning from yolov3-spp.pt
), and I saved them as results2_100img_tl
. Of course, since I modified the loss function I reran the original tutorial using the modified loss (as results2_100img
), and plotted the two results against our original tutorial result results_100img
. The two conclusions are:
The same plot zoomed in to the first 50 epochs (you can see here transfer learning in orange does initially perform better).
To recreate these results simply modify the loss as above and run:
git pull # Update to latest
rm results*.txt # WARNING: removes old results
python3 train.py --nosave --data data/coco_100img.data --transfer && mv results.txt results2_100img_tl.txt
python3 train.py --nosave --data data/coco_100img.data && mv results.txt results2_100img.txt
python3 -c "from utils import utils; utils.plot_results()"
@vivian-wong first of all, congratulations on your analysis, you've done a great job of investigating the performance here in different settings.
I think you've inadvertently discovered the same result I've seen earlier, that transfer learning produces worse results vs normal training. What you are seeing in your second set of plots is that all the layers that were frozen before become unfrozen, and are optimized just as in regular training when you manually
--resume
at epoch 273.There are no LR issues, the LR is behaving as expected in both cases, reducing by 10x at epochs 218 and 245, which roughly correspond to 400k and 500k batches in darknet. This LR scheduler is set for COCO, you probably want to tune it to your custom dataset.
If past experience is correct, you will obtain the best results by simply training normally, and not messing around with transfer learning. By definition, when you freeze layers, they can not adapt to new data, so transfer learning will never produce results as accurate as normal learning.
@glenn-jocher Thank you for this clear explanation, but may I know the advantage of using transfer learning vs training from scratch? Previously, I thought transfer learning also helps to improve the mAP.
@rlgalvez transfer learning should get you decent results quickly, as it only needs to develop gradients for the unfrozen layers training requires less resources (i.e. perhaps an edge device).
For training you will always get the best results training all layers normally. See the custom training tutorials.
this shows the coco_64img.data tutorial starting from a few different options, including transfer learning. Transfer learning as shown below typically freezes the main pretrained weights, which constrains its performance. You can replicate these results with this code and looking at the resultant results.png file.
python3 train.py --data data/coco_64img.data --batch-size 16 --accumulate 1 --nosave --weights weights/yolov3-spp.weights --transfer --name yolov3-spp_transfer # TRANSFER LEARNING COMPARISON
python3 train.py --data data/coco_64img.data --batch-size 16 --accumulate 1 --nosave --weights '' --name from_scratch
python3 train.py --data data/coco_64img.data --batch-size 16 --accumulate 1 --nosave --weights weights/darknet53.conv.74 --name darknet53_backbone
python3 train.py --data data/coco_64img.data --batch-size 16 --accumulate 1 --nosave --weights weights/yolov3-spp.weights --name yolov3-spp_backbone
Describe the bug When I train my custom dataset (via transfer learning) over 273 epochs (for example 840 epochs) , I found out that I get different results if I stop at 273 epochs, then do "--resume --epochs 840", versus if I do "--epochs 840"
This is what happens if I set epochs = 840 from the beginning of training. This is what happens if I resume after epochs 273.
To Reproduce Steps to reproduce the behavior:
python train.py --data-cfg data/*mydata* --transfer python train.py --data-cfg data/*mydata.data* --resume --epochs 840
python train.py --data-cfg data/*mydata* --transfer --epochs 840
Expected behavior Theoretically both should return the same results. The second scenario should not have converged that early. My conjecture is that it has something to do with the change in learning rate?
Desktop (please complete the following information):