RuntimeError while customizing Yolo model

debparth commented 3 years ago

🐛 Bug

I'm creating a custom yolov5m model. I've added FPN Backbone and also in Head with few customization and one extra layer just like in Yolov5m6 and it's giving me following error:

RuntimeError: Sizes of tensors must match except in dimension 1. Got 64 and 32 in dimension 2 (The offending index is 1)

To Reproduce (REQUIRED)

Created model.yaml

head:
  [
   [-1, 3, BottleneckCSP, [1024, False]],  # 10

   [-1, 1, nn.Upsample, [None, 2, 'nearest']],
   [[-1, 6], 1, Concat, [1]],  
   [-1, 1, Conv, [512, 1, 1]],
   [-1, 3, BottleneckCSP, [512, False]],  # 14 

   [-1, 1, nn.Upsample, [None, 2, 'nearest']],
   [[-1, 4], 1, Concat, [1]], 
   [-1, 1, Conv, [256, 1, 1]],
   [-1, 3, BottleneckCSP, [256, False]],  # 18 

   [-1, 1, Conv, [512, 1, 1]],
   [-1, 1, nn.Upsample, [None, 2, 'nearest']],
   [[-1, 4], 1, Concat, [1]],  
   [-1, 3, C3, [512, False]],  # 22

   [-1, 1, Conv, [256, 1, 1]],
   [-1, 1, nn.Upsample, [None, 2, 'nearest']],
   [[-1, 4], 1, Concat, [1]],  
   [-1, 3, C3, [256, False]],  # 26 (P3/8-small)

   [-1, 1, Conv, [256, 3, 2]],
   [[-1, 20], 1, Concat, [1]],  
   [-1, 3, C3, [512, False]],  # 29 (P4/16-medium)

   [-1, 1, Conv, [512, 3, 2]],
   [[-1, 16], 1, Concat, [1]],  
   [-1, 3, C3, [768, False]],  # 32 (P5/32-large)

   [-1, 1, Conv, [768, 3, 2]],
   [[ -1, 12 ], 1, Concat, [ 1]], 
   [-1, 3, C3, [1024, False]],  # 35 (P6/64-xlarge)

   [[ 10, 14, 18, 22, 26, 29, 32, 35 ], 1, Detect, [ nc, anchors ]],
  ]

Expected behavior

It should have created model and started training.

Environment

If applicable, add screenshots to help explain your problem.

OS: Ubuntu 18.04
GPU: Nvidia T4

Additional context

Model Param Log:


                 from  n    params  module                                  arguments                     
  0                -1  1      7040  models.common.Focus                     [3, 64, 3]                    
  1                -1  1     73984  models.common.Conv                      [64, 128, 3, 2]               
  2                -1  3    246912  models.common.Bottleneck                [128, 128]                    
  3                -1  1    295424  models.common.Conv                      [128, 256, 3, 2]              
  4                -1  1   1627904  models.common.BottleneckCSP             [256, 256, 9]                 
  5                -1  1   1180672  models.common.Conv                      [256, 512, 3, 2]              
  6                -1  1   6499840  models.common.BottleneckCSP             [512, 512, 9]                 
  7                -1  1   4720640  models.common.Conv                      [512, 1024, 3, 2]             
  8                -1  1   2624512  models.common.SPP                       [1024, 1024, [5, 9, 13]]      
  9                -1  1  18105344  models.common.BottleneckCSP             [1024, 1024, 6]               
 10                -1  1  10234880  models.common.BottleneckCSP             [1024, 1024, 3, False]        
 11                -1  1         0  torch.nn.modules.upsampling.Upsample    [None, 2, 'nearest']          
 12           [-1, 6]  1         0  models.common.Concat                    [1]                           
 13                -1  1    787456  models.common.Conv                      [1536, 512, 1, 1]             
 14                -1  1   2561536  models.common.BottleneckCSP             [512, 512, 3, False]          
 15                -1  1         0  torch.nn.modules.upsampling.Upsample    [None, 2, 'nearest']          
 16           [-1, 4]  1         0  models.common.Concat                    [1]                           
 17                -1  1    197120  models.common.Conv                      [768, 256, 1, 1]              
 18                -1  1    641792  models.common.BottleneckCSP             [256, 256, 3, False]          
 19                -1  1    132096  models.common.Conv                      [256, 512, 1, 1]              
 20                -1  1         0  torch.nn.modules.upsampling.Upsample    [None, 2, 'nearest']          
 21           [-1, 4]  1         0  models.common.Concat                    [1]                           
 22                -1  1   2626560  models.common.C3                        [768, 512, 3, False]          
 23                -1  1    131584  models.common.Conv                      [512, 256, 1, 1]              
 24                -1  1         0  torch.nn.modules.upsampling.Upsample    [None, 2, 'nearest']          
 25           [-1, 4]  1         0  models.common.Concat                    [1]                           
 26                -1  1    690688  models.common.C3                        [512, 256, 3, False]          
 27                -1  1    590336  models.common.Conv                      [256, 256, 3, 2]              
 28          [-1, 20]  1         0  models.common.Concat                    [1]                           
 29                -1  1   2626560  models.common.C3                        [768, 512, 3, False]          
 30                -1  1   2360320  models.common.Conv                      [512, 512, 3, 2]              
 31          [-1, 16]  1         0  models.common.Concat                    [1]                           
 32                -1  1   6004224  models.common.C3                        [1280, 768, 3, False]         
 33                -1  1   5309952  models.common.Conv                      [768, 768, 3, 2]              
 34          [-1, 12]  1         0  models.common.Concat                    [1]                           
 35                -1  1  11282432  models.common.C3                        [2304, 1024, 3, False]        
 36[10, 14, 18, 22, 26, 29, 32, 35]  1     87696  models.yolo.Detect                      [1, [[10, 13, 16, 30, 33, 23], [30, 61, 62, 45, 59, 119], [116, 90, 156, 198, 373, 326]], [1024, 512, 256, 512, 256, 512, 768, 1024]]

glenn-jocher commented 3 years ago

@debparth I'm removing the bug label as from your description your issue is not related to the repository but rather to your own work.

If you are going to build your own yamls you must ensure the input and output shapes of the layers correspond correctly, otherwise you will see this error.

debparth commented 3 years ago

@glenn-jocher Ok. Is there a way to check each and every layer's input and output shapes when we pass the custom model yaml file?

Lg955 commented 3 years ago

@glenn-jocher I want to use 3 models(yolov5x, yolov5x6, yolov5l6) to predict the images, but got a error `Traceback (most recent call last):

File "detect.py", line 199, in detect(opt=opt) File "detect.py", line 78, in detect pred = model(img, augment=opt.augment)[0] File "/home/l/anaconda3/envs/yolo5/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, kwargs) File "/dataset/code/code/yolov5/models/experimental.py", line 106, in forward y.append(module(x, augment)[0]) File "/home/l/anaconda3/envs/yolo5/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, *kwargs) File "/dataset/code/code/yolov5/models/yolo.py", line 113, in forward yi = self.forward_once(xi)[0] # forward File "/dataset/code/code/yolov5/models/yolo.py", line 139, in forward_once x = m(x) # run File "/home/l/anaconda3/envs/yolo5/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(input, kwargs) File "/dataset/code/code/yolov5/models/common.py", line 210, in forward return torch.cat(x, self.d) RuntimeError: Sizes of tensors must match except in dimension 2. Got 23 and 24 (The offending index is 0)`

so the 3 models must be the same (yolov5x6, yolov5x6, yolov5x6)?

glenn-jocher commented 3 years ago

@Lg955 you can ensemble any models that were trained on the same dataset. Here is YOLOv5s and YOLOv5s6 ensembled together with detect.py:

If you believe you have a reproducible issue, we suggest you close this issue and raise a new one using the 🐛 Bug Report template, providing screenshots and a minimum reproducible example to help us better understand and diagnose your problem. Thank you!

Lg955 commented 3 years ago

@Lg955 you can ensemble any models that were trained on the same dataset. Here is YOLOv5s and YOLOv5s6 ensembled together with detect.py:

If you believe you have a reproducible issue, we suggest you close this issue and raise a new one using the 🐛 Bug Report template, providing screenshots and a minimum reproducible example to help us better understand and diagnose your problem. Thank you!

Thank you for your reply, now my gpu is on training, I will try this ensemble again and raise a new issue about 15 hours later.

github-actions[bot] commented 3 years ago

👋 Hello, this issue has been automatically marked as stale because it has not had recent activity. Please note it will be closed if no further activity occurs.

Access additional YOLOv5 🚀 resources:

Wiki – https://github.com/ultralytics/yolov5/wiki
Tutorials – https://docs.ultralytics.com/yolov5
Docs – https://docs.ultralytics.com

Access additional Ultralytics ⚡ resources:

Ultralytics HUB – https://ultralytics.com/pricing
Vision API – https://ultralytics.com/yolov5
About Us – https://ultralytics.com/about
Join Our Team – https://ultralytics.com/work
Contact Us – https://ultralytics.com/contact

Feel free to inform us of any other issues you discover or feature requests that come to mind in the future. Pull Requests (PRs) are also always welcomed!

Thank you for your contributions to YOLOv5 🚀 and Vision AI ⭐!

ultralytics / yolov5