Transfer learn a model trained on custom data to new custom data

Can I transfer learn a model trained on custom data to new custom data?

I have trained a model on dataset A starting from the yolov5l.pt weights. The resulting model is stored in a best.pt file. Now I want to use best.pt weights to transfer learn to another custom dataset B. The issue seems to be that dataset A has 13 classes and dataset B has 14.

The commands I used

Training on dataset A python train.py --data ../datasetA.yaml --weights yolov5l.pt Which successfully created best.pt weights.

Training on dataset B python train.py --data ../datasetB.yaml --weights runs/train/exp1/weights/best.pt which results in the following error:

Transferred 644/650 items from runs/train/exp1/weights/best.pt
Scaled weight_decay = 0.0005
Optimizer groups: 110 .bias, 110 conv.weight, 107 other
Traceback (most recent call last):
  File "train.py", line 660, in <module>
    main(opt)
  File "train.py", line 558, in main
    train(opt.hyp, opt, device)
  File "train.py", line 189, in train
    ema.ema.load_state_dict(ckpt['ema'].float().state_dict())
  File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1407, in load_state_dict
    self.__class__.__name__, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for Model:
        size mismatch for model.24.m.0.weight: copying a param with shape torch.Size([54, 256, 1, 1]) from checkpoint, the shape in current model is torch.Size([57, 256, 1, 1]).
        size mismatch for model.24.m.0.bias: copying a param with shape torch.Size([54]) from checkpoint, the shape in current model is torch.Size([57]).
        size mismatch for model.24.m.1.weight: copying a param with shape torch.Size([54, 512, 1, 1]) from checkpoint, the shape in current model is torch.Size([57, 512, 1, 1]).
        size mismatch for model.24.m.1.bias: copying a param with shape torch.Size([54]) from checkpoint, the shape in current model is torch.Size([57]).
        size mismatch for model.24.m.2.weight: copying a param with shape torch.Size([54, 1024, 1, 1]) from checkpoint, the shape in current model is torch.Size([57, 1024, 1, 1]).
        size mismatch for model.24.m.2.bias: copying a param with shape torch.Size([54]) from checkpoint, the shape in current model is torch.Size([57]).

Important to note is that dataset A has 13 classes and dataset B has 14. If I change the nc: to 13 for datasetB.yaml and take out a class I can get past this error, but I obviously can't train with a class missing so this is not an option.

In previous issues I read that the number of classes should not matter when transfer learning, but it's not working for me so my question is what am I doing wrong?

👋 Hello @Stjev, thank you for your interest in 🚀 YOLOv5! Please visit our ⭐️ Tutorials to get started, where you can find quickstart guides for simple tasks like Custom Data Training all the way to advanced concepts like Hyperparameter Evolution.

If this is a 🐛 Bug Report, please provide screenshots and minimum viable code to reproduce your issue, otherwise we can not help you.

If this is a custom training ❓ Question, please provide as much information as possible, including dataset images, training logs, screenshots, and a public link to online W&B logging if available.

For business inquiries or professional support requests please visit https://www.ultralytics.com or email Glenn Jocher at glenn.jocher@ultralytics.com.

Requirements

Python 3.8 or later with all requirements.txt dependencies installed, including torch>=1.7. To install run:

$ pip install -r requirements.txt

Environments

YOLOv5 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):

Google Colab and Kaggle notebooks with free GPU:
Google Cloud Deep Learning VM. See GCP Quickstart Guide
Amazon Deep Learning AMI. See AWS Quickstart Guide
Docker Image. See Docker Quickstart Guide

Status

If this badge is green, all YOLOv5 GitHub Actions Continuous Integration (CI) tests are currently passing. CI tests verify correct operation of YOLOv5 training (train.py), testing (val.py), inference (detect.py) and export (export.py) on MacOS, Windows, and Ubuntu every 24 hours and on every commit.

@Stjev with YOLOv5 you can start training any pretrained model on any dataset, there are no constraints.

For example to start training 20 class voc from 80 class official models:

python train.py --data VOC.yaml --weights yolov5s.pt

@glenn-jocher Yes, I thought so, but what is causing the error then? I am doing the command you provided without any success. All weights were created by the train.py script, so I was convinced it would work the way I tried it, but I can't get rid of the error about the mismatched tensor sizes.

@glenn-jocher After running: python train.py --data ../datasetB.yaml --weights runs/train/exp1/weights/best.pt I get the following output:

train: weights=runs/train/exp1/weights/best.pt, cfg=, data=../dataset.yaml, hyp=data/hyps/hyp.scratch.yaml, epochs=300, batch_size=16, img_size=[640, 640], rect=False, resume=False, nosave=False, noval=False, noautoanchor=False, evolve=None, bucket=, cache_images=False, image_weights=False, device=, multi_scale=False, single_cls=False, adam=False, sync_bn=False, workers=8, project=runs/train, entity=None, exist_ok=False, quad=False, linear_lr=False, label_smoothing=0.0, upload_dataset=False, bbox_interval=-1, save_period=-1, artifact_alias=latest, local_rank=-1
github: up to date with https://github.com/ultralytics/yolov5 ✅
YOLOv5 🚀 v5.0-290-g62409ee torch 1.9.0+cu102 CUDA:0 (Tesla T4, 15109.75MB)

hyperparameters: lr0=0.01, lrf=0.2, momentum=0.937, weight_decay=0.0005, warmup_epochs=3.0, warmup_momentum=0.8, warmup_bias_lr=0.1, box=0.05, cls=0.5, cls_pw=1.0, obj=1.0, obj_pw=1.0, iou_t=0.2, anchor_t=4.0, fl_gamma=0.0, hsv_h=0.015, hsv_s=0.7, hsv_v=0.4, degrees=0.0, translate=0.1, scale=0.5, shear=0.0, perspective=0.0, flipud=0.0, fliplr=0.0, mosaic=1.0, mixup=0.0, copy_paste=0.0
tensorboard: Start with 'tensorboard --logdir runs/train', view at http://localhost:6006/
wandb: Install Weights & Biases for YOLOv5 logging with 'pip install wandb' (recommended)
Overriding model.yaml nc=13 with nc=14

                 from  n    params  module                                  arguments                     
  0                -1  1      7040  models.common.Focus                     [3, 64, 3]                    
  1                -1  1     73984  models.common.Conv                      [64, 128, 3, 2]               
  2                -1  1    156928  models.common.C3                        [128, 128, 3]                 
  3                -1  1    295424  models.common.Conv                      [128, 256, 3, 2]              
  4                -1  1   1611264  models.common.C3                        [256, 256, 9]                 
  5                -1  1   1180672  models.common.Conv                      [256, 512, 3, 2]              
  6                -1  1   6433792  models.common.C3                        [512, 512, 9]                 
  7                -1  1   4720640  models.common.Conv                      [512, 1024, 3, 2]             
  8                -1  1   2624512  models.common.SPP                       [1024, 1024, [5, 9, 13]]      
  9                -1  1   9971712  models.common.C3                        [1024, 1024, 3, False]        
 10                -1  1    525312  models.common.Conv                      [1024, 512, 1, 1]             
 11                -1  1         0  torch.nn.modules.upsampling.Upsample    [None, 2, 'nearest']          
 12           [-1, 6]  1         0  models.common.Concat                    [1]                           
 13                -1  1   2757632  models.common.C3                        [1024, 512, 3, False]         
 14                -1  1    131584  models.common.Conv                      [512, 256, 1, 1]              
 15                -1  1         0  torch.nn.modules.upsampling.Upsample    [None, 2, 'nearest']          
 16           [-1, 4]  1         0  models.common.Concat                    [1]                           
 17                -1  1    690688  models.common.C3                        [512, 256, 3, False]          
 18                -1  1    590336  models.common.Conv                      [256, 256, 3, 2]              
 19          [-1, 14]  1         0  models.common.Concat                    [1]                           
 20                -1  1   2495488  models.common.C3                        [512, 512, 3, False]          
 21                -1  1   2360320  models.common.Conv                      [512, 512, 3, 2]              
 22          [-1, 10]  1         0  models.common.Concat                    [1]                           
 23                -1  1   9971712  models.common.C3                        [1024, 1024, 3, False]        
 24      [17, 20, 23]  1    102315  models.yolo.Detect                      [14, [[10, 13, 16, 30, 33, 23], [30, 61, 62, 45, 59, 119], [116, 90, 156, 198, 373, 326]], [256, 512, 1024]]
/home/ubuntu/anaconda3/lib/python3.7/site-packages/torch/nn/functional.py:718: UserWarning: Named tensors and all their associated APIs are an experimental feature and subject to change. Please do not use them for anything important until they are released as stable. (Triggered internally at  /pytorch/c10/core/TensorImpl.h:1156.)
  return torch.max_pool2d(input, kernel_size, stride, padding, dilation, ceil_mode)
Model Summary: 499 layers, 46701355 parameters, 46701355 gradients, 114.5 GFLOPs

Transferred 644/650 items from runs/train/exp1/weights/best.pt
Scaled weight_decay = 0.0005
Optimizer groups: 110 .bias, 110 conv.weight, 107 other
Traceback (most recent call last):
  File "train.py", line 660, in <module>
    main(opt)
  File "train.py", line 558, in main
    train(opt.hyp, opt, device)
  File "train.py", line 189, in train
    ema.ema.load_state_dict(ckpt['ema'].float().state_dict())
  File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1407, in load_state_dict
    self.__class__.__name__, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for Model:
        size mismatch for model.24.m.0.weight: copying a param with shape torch.Size([54, 256, 1, 1]) from checkpoint, the shape in current model is torch.Size([57, 256, 1, 1]).
        size mismatch for model.24.m.0.bias: copying a param with shape torch.Size([54]) from checkpoint, the shape in current model is torch.Size([57]).
        size mismatch for model.24.m.1.weight: copying a param with shape torch.Size([54, 512, 1, 1]) from checkpoint, the shape in current model is torch.Size([57, 512, 1, 1]).
        size mismatch for model.24.m.1.bias: copying a param with shape torch.Size([54]) from checkpoint, the shape in current model is torch.Size([57]).
        size mismatch for model.24.m.2.weight: copying a param with shape torch.Size([54, 1024, 1, 1]) from checkpoint, the shape in current model is torch.Size([57, 1024, 1, 1]).
        size mismatch for model.24.m.2.bias: copying a param with shape torch.Size([54]) from checkpoint, the shape in current model is torch.Size([57]).

I also tried the VOC.yaml like in your command, same issue:

train: weights=runs/train/exp1/weights/best.pt, cfg=, data=data/VOC.yaml, hyp=data/hyps/hyp.scratch.yaml, epochs=300, batch_size=16, img_size=[640, 640], rect=False, resume=False, nosave=False, noval=False, noautoanchor=False, evolve=None, bucket=, cache_images=False, image_weights=False, device=, multi_scale=False, single_cls=False, adam=False, sync_bn=False, workers=8, project=runs/train, entity=None, name=exp, exist_ok=False, quad=False, linear_lr=False, label_smoothing=0.0, upload_dataset=False, bbox_interval=-1, save_period=-1, artifact_alias=latest, local_rank=-1
github: up to date with https://github.com/ultralytics/yolov5 ✅
YOLOv5 🚀 v5.0-290-g62409ee torch 1.9.0+cu102 CUDA:0 (Tesla T4, 15109.75MB)

hyperparameters: lr0=0.01, lrf=0.2, momentum=0.937, weight_decay=0.0005, warmup_epochs=3.0, warmup_momentum=0.8, warmup_bias_lr=0.1, box=0.05, cls=0.5, cls_pw=1.0, obj=1.0, obj_pw=1.0, iou_t=0.2, anchor_t=4.0, fl_gamma=0.0, hsv_h=0.015, hsv_s=0.7, hsv_v=0.4, degrees=0.0, translate=0.1, scale=0.5, shear=0.0, perspective=0.0, flipud=0.0, fliplr=0.0, mosaic=1.0, mixup=0.0, copy_paste=0.0
tensorboard: Start with 'tensorboard --logdir runs/train', view at http://localhost:6006/
wandb: Install Weights & Biases for YOLOv5 logging with 'pip install wandb' (recommended)
Overriding model.yaml nc=13 with nc=20

                 from  n    params  module                                  arguments                     
  0                -1  1      7040  models.common.Focus                     [3, 64, 3]                    
  1                -1  1     73984  models.common.Conv                      [64, 128, 3, 2]               
  2                -1  1    156928  models.common.C3                        [128, 128, 3]                 
  3                -1  1    295424  models.common.Conv                      [128, 256, 3, 2]              
  4                -1  1   1611264  models.common.C3                        [256, 256, 9]                 
  5                -1  1   1180672  models.common.Conv                      [256, 512, 3, 2]              
  6                -1  1   6433792  models.common.C3                        [512, 512, 9]                 
  7                -1  1   4720640  models.common.Conv                      [512, 1024, 3, 2]             
  8                -1  1   2624512  models.common.SPP                       [1024, 1024, [5, 9, 13]]      
  9                -1  1   9971712  models.common.C3                        [1024, 1024, 3, False]        
 10                -1  1    525312  models.common.Conv                      [1024, 512, 1, 1]             
 11                -1  1         0  torch.nn.modules.upsampling.Upsample    [None, 2, 'nearest']          
 12           [-1, 6]  1         0  models.common.Concat                    [1]                           
 13                -1  1   2757632  models.common.C3                        [1024, 512, 3, False]         
 14                -1  1    131584  models.common.Conv                      [512, 256, 1, 1]              
 15                -1  1         0  torch.nn.modules.upsampling.Upsample    [None, 2, 'nearest']          
 16           [-1, 4]  1         0  models.common.Concat                    [1]                           
 17                -1  1    690688  models.common.C3                        [512, 256, 3, False]          
 18                -1  1    590336  models.common.Conv                      [256, 256, 3, 2]              
 19          [-1, 14]  1         0  models.common.Concat                    [1]                           
 20                -1  1   2495488  models.common.C3                        [512, 512, 3, False]          
 21                -1  1   2360320  models.common.Conv                      [512, 512, 3, 2]              
 22          [-1, 10]  1         0  models.common.Concat                    [1]                           
 23                -1  1   9971712  models.common.C3                        [1024, 1024, 3, False]        
 24      [17, 20, 23]  1    134625  models.yolo.Detect                      [20, [[10, 13, 16, 30, 33, 23], [30, 61, 62, 45, 59, 119], [116, 90, 156, 198, 373, 326]], [256, 512, 1024]]
/home/ubuntu/anaconda3/lib/python3.7/site-packages/torch/nn/functional.py:718: UserWarning: Named tensors and all their associated APIs are an experimental feature and subject to change. Please do not use them for anything important until they are released as stable. (Triggered internally at  /pytorch/c10/core/TensorImpl.h:1156.)
  return torch.max_pool2d(input, kernel_size, stride, padding, dilation, ceil_mode)
Model Summary: 499 layers, 46733665 parameters, 46733665 gradients, 114.6 GFLOPs

Transferred 644/650 items from runs/train/exp1/weights/best.pt

WARNING: Dataset not found, nonexistent paths: ['/home/ubuntu/TrafficSignDetection/yoloset/datasets/VOC/images/test2007']
Downloading https://github.com/ultralytics/yolov5/releases/download/v1.0/VOCtrainval_06-Nov-2007.zip to ../datasets/VOC/images/VOCtrainval_06-Nov-2007.zip...
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 425M/425M [00:10<00:00, 42.8MB/s]
Unzipping ../datasets/VOC/images/VOCtrainval_06-Nov-2007.zip...
Downloading https://github.com/ultralytics/yolov5/releases/download/v1.0/VOCtest_06-Nov-2007.zip to ../datasets/VOC/images/VOCtest_06-Nov-2007.zip...
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 418M/418M [00:08<00:00, 52.0MB/s]
Unzipping ../datasets/VOC/images/VOCtest_06-Nov-2007.zip...
Downloading https://github.com/ultralytics/yolov5/releases/download/v1.0/VOCtrainval_11-May-2012.zip to ../datasets/VOC/images/VOCtrainval_11-May-2012.zip...
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1.82G/1.82G [00:57<00:00, 34.1MB/s]
Unzipping ../datasets/VOC/images/VOCtrainval_11-May-2012.zip...
train2012: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5717/5717 [00:02<00:00, 2046.56it/s]
val2012: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5823/5823 [00:02<00:00, 2080.10it/s]
train2007: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2501/2501 [00:01<00:00, 1795.14it/s]
val2007: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2510/2510 [00:01<00:00, 1945.86it/s]
test2007: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4952/4952 [00:02<00:00, 1964.64it/s]
Dataset autodownload success

Scaled weight_decay = 0.0005
Optimizer groups: 110 .bias, 110 conv.weight, 107 other
Traceback (most recent call last):
  File "train.py", line 660, in <module>
    main(opt)
  File "train.py", line 558, in main
    train(opt.hyp, opt, device)
  File "train.py", line 189, in train
    ema.ema.load_state_dict(ckpt['ema'].float().state_dict())
  File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1407, in load_state_dict
    self.__class__.__name__, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for Model:
        size mismatch for model.24.m.0.weight: copying a param with shape torch.Size([54, 256, 1, 1]) from checkpoint, the shape in current model is torch.Size([75, 256, 1, 1]).
        size mismatch for model.24.m.0.bias: copying a param with shape torch.Size([54]) from checkpoint, the shape in current model is torch.Size([75]).
        size mismatch for model.24.m.1.weight: copying a param with shape torch.Size([54, 512, 1, 1]) from checkpoint, the shape in current model is torch.Size([75, 512, 1, 1]).
        size mismatch for model.24.m.1.bias: copying a param with shape torch.Size([54]) from checkpoint, the shape in current model is torch.Size([75]).
        size mismatch for model.24.m.2.weight: copying a param with shape torch.Size([54, 1024, 1, 1]) from checkpoint, the shape in current model is torch.Size([75, 1024, 1, 1]).
        size mismatch for model.24.m.2.bias: copying a param with shape torch.Size([54]) from checkpoint, the shape in current model is torch.Size([75]).

@Stjev the error is with your dataset. If you can produce a reproducible bug with official datasets and models then please let us know. Otherwise the error is on your side.

We've created a few short guidelines below to help users provide what we need in order to get started investigating a possible problem.

How to create a Minimal, Reproducible Example

When asking a question, people will be better able to provide help if you provide code that they can easily understand and use to reproduce the problem. This is referred to by community members as creating a minimum reproducible example. Your code that reproduces the problem should be:

✅ Minimal – Use as little code as possible that still produces the same problem
✅ Complete – Provide all parts someone else needs to reproduce your problem in the question itself
✅ Reproducible – Test the code you're about to provide to make sure it reproduces the problem

In addition to the above requirements, for Ultralytics to provide assistance your code should be:

✅ Current – Verify that your code is up-to-date with current GitHub master, and if necessary git pull or git clone a new copy to ensure your problem has not already been resolved by previous commits.
✅ Unmodified – Your problem must be reproducible without any modifications to the codebase in this repository. Ultralytics does not provide support for custom code ⚠️.

If you believe your problem meets all of the above criteria, please close this issue and raise a new one using the 🐛 Bug Report template and providing a minimum reproducible example to help us better understand and diagnose your problem.

Thank you! 😃

@glenn-jocher @Stjev Hi, I meet this problem too. I trained yolov5-l on object365 and used the pretrained model to train my other data which has 3 classes, then it told me that as following: size mismatch for model.24.m.0.weight: copying a param with shape torch.Size([1110, 256, 1, 1]) from checkpoint, the shape in current model is torch.Size([24, 256, 1, 1]). But when I used the official model yolov5l.pt you trained is ok. I have tried to modify the code as shown below:

Then, the below code ema.ema.load_state_dict(ckpt['ema'].float().state_dict()) appears the same error

Finally, I uncommented the above code but other bug appears :

In the end, I guess the problem is that the model we trained have some problem, but I don't konw what the problem it is ... I have tried to resume training yolov5 model trained on object365, it is ok. I'm confused ...

@XiaoJiNu any YOLOv5 model can be used as pretrained weights for any other dataset, class count is irrelevant.

We've created a few short guidelines below to help users provide what we need in order to get started investigating a possible problem.

How to create a Minimal, Reproducible Example

✅ Minimal – Use as little code as possible that still produces the same problem
✅ Complete – Provide all parts someone else needs to reproduce your problem in the question itself
✅ Reproducible – Test the code you're about to provide to make sure it reproduces the problem

In addition to the above requirements, for Ultralytics to provide assistance your code should be:

✅ Current – Verify that your code is up-to-date with current GitHub master, and if necessary git pull or git clone a new copy to ensure your problem has not already been resolved by previous commits.
✅ Unmodified – Your problem must be reproducible without any modifications to the codebase in this repository. Ultralytics does not provide support for custom code ⚠️.

Thank you! 😃

Hey @XiaoJiNu @Stjev this has an easy fix. Basically the problem is that the custom model you're using isn't fully trained. At the end of trained, the optimizer is stripped off. However, if you try to use an intermediate model for transfer learning on another dataset, that will give this error.

Follow these steps: cd yolov5/ python3

from utils.general import * strip_optimizer('/path/to/model.pt')

Thats it. Your model file will be replaced and ready to train on another dataset.

👋 Hello, this issue has been automatically marked as stale because it has not had recent activity. Please note it will be closed if no further activity occurs.

Access additional YOLOv5 🚀 resources:

Wiki – https://github.com/ultralytics/yolov5/wiki
Tutorials – https://docs.ultralytics.com/yolov5
Docs – https://docs.ultralytics.com

Access additional Ultralytics ⚡ resources:

Ultralytics HUB – https://ultralytics.com
Vision API – https://ultralytics.com/yolov5
About Us – https://ultralytics.com/about
Join Our Team – https://ultralytics.com/work
Contact Us – https://ultralytics.com/contact

Feel free to inform us of any other issues you discover or feature requests that come to mind in the future. Pull Requests (PRs) are also always welcomed!

Thank you for your contributions to YOLOv5 🚀 and Vision AI ⭐!

ultralytics / yolov5