ultralytics / yolov3

YOLOv3 in PyTorch > ONNX > CoreML > TFLite
https://docs.ultralytics.com
GNU Affero General Public License v3.0
10.22k stars 3.45k forks source link

Dimension Error during Testing #671

Closed russellave closed 4 years ago

russellave commented 4 years ago

I am training a custom dataset. During training, when testing is done after an epoch, I receive a dimension error. The only things I changed in the config file were the number of filters in the last conv layer (from 255 to 48) and the number of classes (11)

I run this command: python3 train.py --data data/custom_data.data --cfg cfg/custom_cfg.cfg --weights '' --batch-size 16

This is my output:

Reading labels (100 found, 0 missing, 0 empty for 100 images): 100% 100/100 [00:00<00:00, 716.85it/s] Model Summary: 222 layers, 6.1896e+07 parameters, 6.1896e+07 gradients Starting training for 273 epochs...

 Epoch   gpu_mem      GIoU       obj       cls     total   targets  img_size
 0/272     4.39G      1.45      0.83      19.2      21.5         4       416: 100% 13/13 [00:08<00:00,  2.29it/s]
           Class    Images   Targets         P         R   mAP@0.5        F1:   0% 0/13 [00:00<?, ?it/s]

Traceback (most recent call last): File "train.py", line 448, in train() # train normally File "train.py", line 320, in train save_json=final_epoch and epoch > 0 and 'coco.data' in data) File "/content/drive/My Drive/Yolo/yolov3/test.py", line 74, in test inf_out, train_out = model(imgs) # inference and training outputs File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 541, in call result = self.forward(*input, **kwargs) File "/content/drive/My Drive/Yolo/yolov3/models.py", line 273, in forward return torch.cat(io, 1), p RuntimeError: invalid argument 0: Sizes of tensors must match except in dimension 1. Got 16 and 85 in dimension 2 at /pytorch/aten/src/THC/generic/THCTensorMath.cu:71

Do you have an idea on where I can look to fix this?

FranciscoReveriano commented 4 years ago

Possibly. By default, the program will call the spp.weights file if you try using "--weights". You should make sure that your .cfg file is configured for that weights. Or you should be more specific and call weights/yolov3.weights The only other possible reason is that you didn't properly change the filters in the last convolutional layer. The layer right before each Yolo layer should be changed. Please let us know if neither of these two things helps.

russellave commented 4 years ago

I missed one of the convolutional layers in the cfg file. This problem is now gone. Thanks!

glenn-jocher commented 12 months ago

@russellave you're welcome! I'm glad to hear that the issue has been resolved. If you have any more questions or need further assistance, feel free to ask. Good luck with your training!