ultralytics / yolov3

YOLOv3 in PyTorch > ONNX > CoreML > TFLite
https://docs.ultralytics.com
GNU Affero General Public License v3.0
10.16k stars 3.44k forks source link

RuntimeError: Sizes of tensors must match except in dimension 2. Got 17 and 85 (The offending index is 1) #1509

Closed MaxwellHogan closed 3 years ago

MaxwellHogan commented 3 years ago

❔Question

Can you assist in tracking down the root of this problem?

Additional context

I am reporting this a a question as opposed to a bug since I am pretty sure I am creating the error myself somehow, however any advice would be much appreciated.

I am executing the code in google colab torch version is 1.6.0+cu101

I have followed the below tutorial to prepare my custom dataset, I have even resized the images of my dataset to be all the same size and made sure that I normalise my labels correctly to account for the resized image https://github.com/ultralytics/yolov3/wiki/Train-Custom-Data

I have 12 classes and have modified the cfg file so the final conv layer has 51 outputs - which I actually downloaded straight from the darknet GitHub repository: https://github.com/pjreddie/darknet/blob/master/cfg/yolov3.cfg

the error is shown below, I believe it to be occurring in the YOLO layer in the network, however it tends to occur at the end of the first epoch so I am unsure of this. Traceback (most recent call last): File "train.py", line 431, in <module> train(hyp) # train normally File "train.py", line 333, in train multi_label=ni > n_burn) File "/gdrive/My Drive/Aerial_classification_v1/yolov3/test.py", line 76, in test _ = model(torch.zeros((1, 3, imgsz, imgsz), device=device)) if device.type != 'cpu' else None # run once File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 722, in _call_impl result = self.forward(*input, **kwargs) File "/gdrive/My Drive/Aerial_classification_v1/yolov3/models.py", line 244, in forward return self.forward_once(x) File "/gdrive/My Drive/Aerial_classification_v1/yolov3/models.py", line 312, in forward_once x = torch.cat(x, 1) # cat yolo outputs RuntimeError: Sizes of tensors must match except in dimension 2. Got 17 and 85 (The offending index is 1)

github-actions[bot] commented 3 years ago

Hello @MaxwellHogan, thank you for your interest in our work! Ultralytics has open-sourced YOLOv5 at https://github.com/ultralytics/yolov5, featuring faster, lighter and more accurate object detection. YOLOv5 is recommended for all new projects.



To continue with this repo, please visit our Custom Training Tutorial to get started, and see our Google Colab Notebook, Docker Image, and GCP Quickstart Guide for example environments.

If this is a bug report, please provide screenshots and minimum viable code to reproduce your issue, otherwise we can not help you.

If this is a custom model or data training question, please note that Ultralytics does not provide free personal support. As a leader in vision ML and AI, we do offer professional consulting, from simple expert advice up to delivery of fully customized, end-to-end production solutions for our clients, such as:

For more information please visit https://www.ultralytics.com.

MaxwellHogan commented 3 years ago

embarrassingly, I missed changing one of the values in the file, I doubt anyone will have this problem but if they are reading this, check your .cfg file twice

MALLI7622 commented 3 years ago

Hey @MaxwellHogan I am also getting the same error. I had cross-checked it. But still, I didn't get it. Can you please help with which value we have to change?

MaxwellHogan commented 3 years ago

@MALLI7622 The image below shows the two values that you need to change in the .cfg file,

they appear three times in the file, in mine it is around line 607, 693 and 780, you can find them easily by ctrl+f 'yolo'

as per the tutorial you need to adapt them for the number of classes you have, in my case it is 12, therefore the final convolutional layer before the yolo layer needs to have (5 + 12) * 3 filters.

filters = (5 + n) * 3 classes = n

I had made the mistake of not updating one of the filters and so the final convolutions and so the tensors would be different sizes going into the yolo layer

Untitled

glenn-jocher commented 3 years ago

@MALLI7622 @MaxwellHogan cfg/darknet complications are entirely eliminated in YOLOv5, and performance out of the box is increased greatly: https://github.com/ultralytics/yolov5

MaxwellHogan commented 3 years ago

fancy

jas-nat commented 3 years ago

@MaxwellHogan Hi I observed another unique bug. I tried to deepen the YOLO layers by making 1 more YOLO output with 1 more residual block after the last residual block in the original YOLO. I could train it and validate it perfectly for 200 epochs. However, when I extracted the best model, the model threw and error the same as you discovered. I have checked all the config files, but I got nothing to debug. Could you help me?

I attached the screenshot of my terminals. The left side is the training part assuming I rerun the training for 1 epoch and it could run perfectly, but in the right picture, the validation using test.py file could not run.

anydesk00001

Any help will be appreciated. Thank you!

MaxwellHogan commented 3 years ago

@jas-nat We'd probably need to see your config file to help you,

otherwise, if I'm correct in my judgment by your .cfg filename - that you are using colour 4 channels instead of 3 - that may be partially responsible as you also are using different batch sixes (4 on the left 16 on the right) you may want to make sure the extra channel and the batch size are getting muddled.

That's the only thing I can think of since the test set would've been loaded during the fist training run

However, I'd suggest you'd start using @glenn-jocher 's YOLOV5 I have been using it and feels easier to work with, you can also download the model directly from Pytorch hub if it makes it easier for you to work for

https://github.com/ultralytics/yolov5

jas-nat commented 3 years ago

@MaxwellHogan Thank you so much for your reply! Actually if you don't mind, here is the config file I used. I modified it based on the yolov3-spp-1cls config file in this repository.

I have tried already by changing the batch size into 4 but still threw the same error.

I really would like to try to use YOLOv5, but unfortunately, I have been using this repository for my final thesis project as it comes to the end. I hope I have more time to give a try using the YOLOv5. Thank you once more

nickhward commented 3 years ago

@jas-nat I'm getting the same error as your image on the right. Were you able to solve your problem?

GMN23362 commented 2 years ago

@MALLI7622下图显示了您需要在 .cfg 文件中更改的两个值,

它们在文件中出现了 3 次,在我的 607、693 和 780 行附近,您可以通过 ctrl+f 'yolo' 轻松找到它们

根据教程,您需要根据您拥有的类数量调整它们,在我的情况下是 12,因此 yolo 层之前的最终卷积层需要具有 (5 + 12) * 3 个过滤器。

过滤器 = (5 + n) * 3 个 类别 = n

我犯了一个错误,没有更新其中一个过滤器,因此最终的卷积和张量进入 yolo 层的大小会不同

无标题

Same question! But I have checked my cfg, and there is nothing wrong with it. So what other probable problems?

glenn-jocher commented 7 months ago

@GMN23362 hello everyone,

It seems there's a bit of confusion regarding the configuration of the YOLO layers and the tensor sizes. If you're encountering size mismatch errors, it's crucial to ensure that the number of filters in the convolutional layers before each YOLO layer is correctly set. For a YOLOv3 model with n classes, the number of filters should be (5 + n) * 3.

If you've already checked your .cfg file and the filters are set correctly, but you're still facing issues, consider the following:

  1. Batch Size: Ensure that your batch size is consistent during both training and testing. Mismatches in batch size should not cause tensor size errors, but it's good practice to keep them consistent.

  2. Model Architecture: If you've modified the architecture by adding layers, ensure that the output of the new layers matches the expected input of subsequent layers. Any additional layers should be properly integrated into the architecture.

  3. Channel Mismatch: If you're using a different number of channels (e.g., 4 instead of 3), make sure that the model is correctly modified to handle the extra channel throughout the entire network.

  4. Weight Loading: When loading weights for testing, ensure that the model architecture in the code matches the architecture for which the weights were trained. Mismatches here can cause size errors.

  5. Code Changes: If you've made any changes to the model's code, double-check those changes to ensure they're not causing the issue.

  6. PyTorch Version: Ensure that you're using a compatible version of PyTorch. Sometimes, differences in PyTorch versions can lead to unexpected behavior.

If you're still unable to resolve the issue, consider providing more details or the exact error message you're encountering. This can help in diagnosing the problem more effectively.

For those working on a thesis or a project with a deadline, I understand the reluctance to switch to a different version like YOLOv5. However, if you're early enough in the project or have the flexibility to switch, YOLOv5 does offer some advantages in terms of ease of use and performance.

Best of luck with your debugging, and remember that the community is here to help! 🛠️😊