ultralytics / yolov3

YOLOv3 in PyTorch > ONNX > CoreML > TFLite
https://docs.ultralytics.com
GNU Affero General Public License v3.0
10.22k stars 3.45k forks source link

Train: RuntimeError: shape '[16, 3, 85, 16, 16]' is invalid for input of size 135168 #1371

Closed DennisFaucher closed 4 years ago

DennisFaucher commented 4 years ago

Before submitting a bug report, please be aware that your issue must be reproducible with all of the following, otherwise it is non-actionable, and we can not help you:

If this is a custom dataset/training question you must include your train*.jpg, test*.jpg and results.png figures, or we can not help you. You can generate results.png with utils.plot_results().

🐛 Bug

A clear and concise description of what the bug is.

Could not create a custom object detection model using instructions in Readme

To Reproduce (REQUIRED)

Input:

 python3 train.py --cfg 6chix-spp.cfg --data 6chix.data --nosave

Output:

$ python3 train.py --cfg 6chix-spp.cfg --data 6chix.data --nosave
Apex recommended for faster mixed precision training: https://github.com/NVIDIA/apex
Namespace(adam=False, batch_size=16, bucket='', cache_images=False, cfg='./cfg/6chix-spp.cfg', data='6chix.data', device='', epochs=300, evolve=False, freeze_layers=False, img_size=[320, 640], multi_scale=False, name='', nosave=True, notest=False, rect=False, resume=False, single_cls=False, weights='weights/yolov3-spp-ultralytics.pt')
Using CUDA device0 _CudaDeviceProperties(name='Xavier', total_memory=15814MB)

Start Tensorboard with "tensorboard --logdir=runs", view at http://localhost:6006/
Model Summary: 225 layers, 6.26003e+07 parameters, 6.26003e+07 gradients
Optimizer groups: 76 .bias, 76 Conv2d.weight, 73 other
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   408    0   408    0     0   1402      0 --:--:-- --:--:-- --:--:--  1402
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100  240M    0  240M    0     0  4867k      0 --:--:--  0:00:50 --:--:-- 5233k
Downloading https://drive.google.com/uc?export=download&id=1UcR-zVoMs7DH5dj3N1bswkiQTA4dmKF4 as weights/yolov3-spp-ultralytics.pt... Done (51.2s)
Reading image shapes: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 720/720 [00:03<00:00, 210.26it/s]
Caching labels train.txt (719 found, 1 missing, 0 empty, 0 duplicate, for 720 images): 100%|█████████████████████████████████████████████████████████████████████████████| 720/720 [00:01<00:00, 607.56it/s]
Reading image shapes: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 79/79 [00:00<00:00, 227.33it/s]
Caching labels test.txt (79 found, 0 missing, 0 empty, 0 duplicate, for 79 images): 100%|██████████████████████████████████████████████████████████████████████████████████| 79/79 [00:00<00:00, 565.51it/s]
Image sizes 320 - 640 train, 640 test
Using 8 dataloader workers
Starting training for 300 epochs...

     Epoch   gpu_mem      GIoU       obj       cls     total   targets  img_size
  0%|                                                                                                                                                                                | 0/45 [00:22<?, ?it/s]
Traceback (most recent call last):
  File "train.py", line 431, in <module>
    train(hyp)  # train normally
  File "train.py", line 279, in train
    pred = model(imgs)
  File "/home/dennis/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/dennis/external/yolov3/models.py", line 244, in forward
    return self.forward_once(x)
  File "/home/dennis/external/yolov3/models.py", line 296, in forward_once
    yolo_out.append(module(x, out))
  File "/home/dennis/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/dennis/external/yolov3/models.py", line 197, in forward
    p = p.view(bs, self.na, self.no, self.ny, self.nx).permute(0, 1, 3, 4, 2).contiguous()  # prediction
RuntimeError: shape '[16, 3, 85, 16, 16]' is invalid for input of size 135168

Expected behavior

A clear and concise description of what you expected to happen.

New model trained

Environment

If applicable, add screenshots to help explain your problem.

Additional context

Add any other context about the problem here.

Thanks in advance

github-actions[bot] commented 4 years ago

Hello @DennisFaucher, thank you for your interest in our work! Ultralytics has open-sourced YOLOv5 at https://github.com/ultralytics/yolov5, featuring faster, lighter and more accurate object detection. YOLOv5 is recommended for all new projects.

To continue with this repo, please visit our Custom Training Tutorial to get started, and see our Google Colab Notebook, Docker Image, and GCP Quickstart Guide for example environments.

If this is a bug report, please provide screenshots and minimum viable code to reproduce your issue, otherwise we can not help you.

If this is a custom model or data training question, please note that Ultralytics does not provide free personal support. As a leader in vision ML and AI, we do offer professional consulting, from simple expert advice up to delivery of fully customized, end-to-end production solutions for our clients, such as:

For more information please visit https://www.ultralytics.com.

DennisFaucher commented 4 years ago

Closing as I am an idiot. Forgot to edit the "classes" section of the .cfg. Training is running now.

glenn-jocher commented 4 years ago

@DennisFaucher yes this looks like a cfg issue. I'd recommend yolov5, as it has a significantly simpler custom training setup.