Transfer Learning with Frozen Layers

glenn-jocher commented 3 years ago

📚 This guide explains how to freeze YOLOv5 🚀 layers when transfer learning. Transfer learning is a useful way to quickly retrain a model on new data without having to retrain the entire network. Instead, part of the initial weights are frozen in place, and the rest of the weights are used to compute loss and are updated by the optimizer. This requires less resources than normal training and allows for faster training times, though it may also results in reductions to final trained accuracy. UPDATED 28 March 2023.

Before You Start

Clone repo and install requirements.txt in a Python>=3.7.0 environment, including PyTorch>=1.7. Models and datasets download automatically from the latest YOLOv5 release.

git clone https://github.com/ultralytics/yolov5  # clone
cd yolov5
pip install -r requirements.txt  # install

Freeze Backbone

All layers that match the freeze list in train.py will be frozen by setting their gradients to zero before training starts. https://github.com/ultralytics/yolov5/blob/771ac6c53ded79c408ed8bd99f7604b7077b7d77/train.py#L119-L126

To see a list of module names:

for k, v in model.named_parameters():
    print(k)

# Output
model.0.conv.conv.weight
model.0.conv.bn.weight
model.0.conv.bn.bias
model.1.conv.weight
model.1.bn.weight
model.1.bn.bias
model.2.cv1.conv.weight
model.2.cv1.bn.weight
...
model.23.m.0.cv2.bn.weight
model.23.m.0.cv2.bn.bias
model.24.m.0.weight
model.24.m.0.bias
model.24.m.1.weight
model.24.m.1.bias
model.24.m.2.weight
model.24.m.2.bias

Looking at the model architecture we can see that the model backbone is layers 0-9: https://github.com/ultralytics/yolov5/blob/58f8ba771e3712b525ca93a1ee66bc2b2df2092f/models/yolov5s.yaml#L12-L48

so we can define the freeze list to contain all modules with 'model.0.' - 'model.9.' in their names:

python train.py --freeze 10

Freeze All Layers

To freeze the full model except for the final output convolution layers in Detect(), we set freeze list to contain all modules with 'model.0.' - 'model.23.' in their names:

python train.py --freeze 24

Results

We train YOLOv5m on VOC on both of the above scenarios, along with a default model (no freezing), starting from the official COCO pretrained --weights yolov5m.pt:

$ train.py --batch 48 --weights yolov5m.pt --data voc.yaml --epochs 50 --cache --img 512 --hyp hyp.finetune.yaml

Accuracy Comparison

The results show that freezing speeds up training, but reduces final accuracy slightly.

GPU Utilization Comparison

Interestingly, the more modules are frozen the less GPU memory is required to train, and the lower GPU utilization. This indicates that larger models, or models trained at larger --image-size may benefit from freezing in order to train faster.

Environments

YOLOv5 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):

Notebooks with free GPU:
Google Cloud Deep Learning VM. See GCP Quickstart Guide
Amazon Deep Learning AMI. See AWS Quickstart Guide
Docker Image. See Docker Quickstart Guide

Status

If this badge is green, all YOLOv5 GitHub Actions Continuous Integration (CI) tests are currently passing. CI tests verify correct operation of YOLOv5 training, validation, inference, export and benchmarks on MacOS, Windows, and Ubuntu every 24 hours and on every commit.

bryanbocao commented 1 year ago

I just want to detect person and motorcycle classes, how can I reduce the number of parameters?

The most straightforward way would be to use a smaller existing model of nano than small in the Terminal output you showed YOLOv5s: --cfg yolov5n.yaml

You can further set the numbers to be smaller if you want a smaller model than the nano one:

depth_multiple: 0.33  # model depth multiple
width_multiple: 0.25  # layer channel multiple

https://github.com/ultralytics/yolov5/blob/master/models/yolov5n.yaml#L5-L6

glenn-jocher commented 9 months ago

@bryanbocao hi there! It's great that you're looking to optimize the model for your specific use case. To reduce the number of parameters, you can consider using a smaller existing model such as nano instead of small in the YOLOv5 model configuration. Additionally, you can adjust the depth_multiple and width_multiple parameters in the configuration file to further reduce the size of the model. You can find these settings in the yolov5n.yaml file at lines 5-6. I hope this helps!

bryanbocao commented 9 months ago

@glenn-jocher Thanks for your reply!

Additionally, you can adjust the depth_multiple and width_multiple parameters in the configuration file to further reduce the size of the model.

That's what I did eventually :)

glenn-jocher commented 9 months ago

@bryanbocao you're welcome! Great to hear that you found a solution by adjusting the depth_multiple and width_multiple parameters. If you have any more questions or need further assistance, feel free to ask. Good luck with your YOLOv5 project!

skyprince999 commented 8 months ago

I am not sure if this is the right place to ask it. But I have a Yolo5x6 model that I want to "convert" to a Yolo5n or Yolo5s model weights. Is there some technique to do that, without having to retrain the model from scratch?

glenn-jocher commented 8 months ago

@skyprince999 hi there! The process you're referring to is known as model distillation or compression, where a larger model (teacher) is used to guide the training of a smaller model (student). However, directly converting weights from a larger model like YOLOv5x6 to a smaller architecture like YOLOv5n or YOLOv5s isn't straightforward because the architectures differ significantly in terms of layer depth and width.

To achieve a smaller model with the knowledge of the larger one, you would typically perform knowledge distillation, which involves training the smaller model using the larger model's outputs as guidance. This process still requires training from scratch but can be faster and result in a more accurate small model than training it directly on the dataset.

For now, YOLOv5 does not support direct weight conversion between different model sizes. You would need to train the smaller model using the standard training procedures, potentially using the larger model's weights for initializing the training process, or you could explore knowledge distillation techniques.

If you're looking to maintain as much performance as possible without retraining from scratch, you might consider fine-tuning the smaller model on your dataset using the larger model's weights as a starting point. This would involve using the --weights flag with the pre-trained larger model's weights and training on your dataset with the smaller model configuration.

skyprince999 commented 8 months ago

Thanks @glenn-jocher for the update. I am aware of the knowledge distillation process. But was wondering if Yolo had some inbuilt mechanism to work with it. Not sure how its done, but I like the idea of initializing the weights of the smaller model with those from the larger model and then training it on the dataset. I'll explore that option.

glenn-jocher commented 8 months ago

@skyprince999 You're welcome! Indeed, YOLOv5 doesn't have an inbuilt mechanism for model distillation, but initializing the smaller model with weights from the larger one and then fine-tuning on your dataset is a practical approach. This method leverages the pre-trained knowledge and can lead to better performance than training from scratch. If you need further guidance on fine-tuning or have any other questions, feel free to reach out. Happy coding! 😊🚀

sriram-dsl commented 3 months ago

@glenn-jocher Might be interesting to do a final step by unfreezing and training the complete netwerk again with differentiated learning rate. So complete training process would be (default method in fast.ai):

Freeze the backbone

(optional reset the head weights)

Train the head for a while

Unfreeze the complete network

Train the complete network with lower learning rate for backbone

Hi @ramonhollands, I am doing the same thing. First, I train the model with a dataset by freezing the head (--freeze 24), then I do another training on the same data, initializing the weights with the previously trained weights (output from the training with freezing) and passing those weights as initial weights (--weights runs/exp/weights/last.pt). Additionally, I pass --hyp where lr1 and lr0 are reduced by 10 times.

In this process, I am training twice on the same dataset. @glenn-jocher @ramonhollands, can you help me reduce this training time by making it the default, where the head will be trained with a low learning rate and the body will be trained with a high learning rate?

glenn-jocher commented 3 months ago

Hi @sriram-dsl! Your approach of using a differentiated learning rate after unfreezing the network is indeed a solid strategy, often leading to better fine-tuning of the model. To implement this in YOLOv5, you can adjust the learning rates directly in the hyp.yaml file used during training. Here’s a quick example of how you might set this up:

Freeze the backbone and train:

python train.py --freeze 24 --weights yolov5s.pt --data yourdata.yaml --epochs 10

Unfreeze and train with differentiated learning rates:

python train.py --weights runs/train/exp/weights/last.pt --data yourdata.yaml --epochs 30 --hyp yourhyp.yaml

In your yourhyp.yaml, specify lower learning rates for earlier layers (backbone) and higher for later layers (head):

lr0: 0.001  # lower base learning rate for backbone
lr1: 0.01   # higher base learning rate for head

This setup should help streamline the process and potentially reduce total training time by more effectively leveraging the initial frozen training phase. 😊👍

sriram-dsl commented 3 months ago

you mean lr0 and lrf right in hyp.scratch-low.yaml

On Mon, 20 May 2024 at 17:57, Glenn Jocher @.***> wrote:

Hi @sriram-dsl https://github.com/sriram-dsl! Your approach of using a differentiated learning rate after unfreezing the network is indeed a solid strategy, often leading to better fine-tuning of the model. To implement this in YOLOv5, you can adjust the learning rates directly in the hyp.yaml file used during training. Here’s a quick example of how you might set this up:

1.

Freeze the backbone and train:

python train.py --freeze 24 --weights yolov5s.pt --data yourdata.yaml --epochs 10

2.

Unfreeze and train with differentiated learning rates:

python train.py --weights runs/train/exp/weights/last.pt --data yourdata.yaml --epochs 30 --hyp yourhyp.yaml

In your yourhyp.yaml, specify lower learning rates for earlier layers (backbone) and higher for later layers (head):

lr0: 0.001 # lower base learning rate for backbonelr1: 0.01 # higher base learning rate for head

This setup should help streamline the process and potentially reduce total training time by more effectively leveraging the initial frozen training phase. 😊👍

— Reply to this email directly, view it on GitHub https://github.com/ultralytics/yolov5/issues/1314#issuecomment-2120361954, or unsubscribe https://github.com/notifications/unsubscribe-auth/BAVOFMN5PLZWY3UOQVEZIL3ZDHT4XAVCNFSM4TM55QMKU5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TEMJSGAZTMMJZGU2A . You are receiving this because you were mentioned.Message ID: @.***>

glenn-jocher commented 3 months ago

@sriram-dsl hi there! Yes, you're correct. In the hyp.scratch-low.yaml, you should adjust lr0 for the base learning rate and lrf for the final learning rate multiplier. This setup allows you to control the learning rates for different parts of the model more effectively during the training process. Here's a quick example:

lr0: 0.001  # lower base learning rate for backbone
lrf: 0.1    # learning rate multiplier for final layers

This configuration helps in fine-tuning the model by applying different learning rates to the backbone and the head. Thanks for pointing that out! 😊👍

ultralytics / yolov5