Train: RuntimeError: shape '[16, 3, 85, 16, 16]' is invalid for input of size 135168

DennisFaucher commented 4 years ago

Before submitting a bug report, please be aware that your issue must be reproducible with all of the following, otherwise it is non-actionable, and we can not help you:

Current repository: run git fetch && git status -uno to check and git pull to update your repo
Common dataset: coco2017.data or coco64.data
Common environment: Colab, Google Cloud, or Docker image. See https://github.com/ultralytics/yolov3#reproduce-our-environment

If this is a custom dataset/training question you must include your train*.jpg, test*.jpg and results.png figures, or we can not help you. You can generate results.png with utils.plot_results().

🐛 Bug

A clear and concise description of what the bug is.

Could not create a custom object detection model using instructions in Readme

To Reproduce (REQUIRED)

Input:

 python3 train.py --cfg 6chix-spp.cfg --data 6chix.data --nosave

Output:

$ python3 train.py --cfg 6chix-spp.cfg --data 6chix.data --nosave
Apex recommended for faster mixed precision training: https://github.com/NVIDIA/apex
Namespace(adam=False, batch_size=16, bucket='', cache_images=False, cfg='./cfg/6chix-spp.cfg', data='6chix.data', device='', epochs=300, evolve=False, freeze_layers=False, img_size=[320, 640], multi_scale=False, name='', nosave=True, notest=False, rect=False, resume=False, single_cls=False, weights='weights/yolov3-spp-ultralytics.pt')
Using CUDA device0 _CudaDeviceProperties(name='Xavier', total_memory=15814MB)

Start Tensorboard with "tensorboard --logdir=runs", view at http://localhost:6006/
Model Summary: 225 layers, 6.26003e+07 parameters, 6.26003e+07 gradients
Optimizer groups: 76 .bias, 76 Conv2d.weight, 73 other
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   408    0   408    0     0   1402      0 --:--:-- --:--:-- --:--:--  1402
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100  240M    0  240M    0     0  4867k      0 --:--:--  0:00:50 --:--:-- 5233k
Downloading https://drive.google.com/uc?export=download&id=1UcR-zVoMs7DH5dj3N1bswkiQTA4dmKF4 as weights/yolov3-spp-ultralytics.pt... Done (51.2s)
Reading image shapes: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 720/720 [00:03<00:00, 210.26it/s]
Caching labels train.txt (719 found, 1 missing, 0 empty, 0 duplicate, for 720 images): 100%|█████████████████████████████████████████████████████████████████████████████| 720/720 [00:01<00:00, 607.56it/s]
Reading image shapes: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 79/79 [00:00<00:00, 227.33it/s]
Caching labels test.txt (79 found, 0 missing, 0 empty, 0 duplicate, for 79 images): 100%|██████████████████████████████████████████████████████████████████████████████████| 79/79 [00:00<00:00, 565.51it/s]
Image sizes 320 - 640 train, 640 test
Using 8 dataloader workers
Starting training for 300 epochs...

     Epoch   gpu_mem      GIoU       obj       cls     total   targets  img_size
  0%|                                                                                                                                                                                | 0/45 [00:22<?, ?it/s]
Traceback (most recent call last):
  File "train.py", line 431, in <module>
    train(hyp)  # train normally
  File "train.py", line 279, in train
    pred = model(imgs)
  File "/home/dennis/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/dennis/external/yolov3/models.py", line 244, in forward
    return self.forward_once(x)
  File "/home/dennis/external/yolov3/models.py", line 296, in forward_once
    yolo_out.append(module(x, out))
  File "/home/dennis/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/dennis/external/yolov3/models.py", line 197, in forward
    p = p.view(bs, self.na, self.no, self.ny, self.nx).permute(0, 1, 3, 4, 2).contiguous()  # prediction
RuntimeError: shape '[16, 3, 85, 16, 16]' is invalid for input of size 135168

Expected behavior

A clear and concise description of what you expected to happen.

New model trained

Environment

If applicable, add screenshots to help explain your problem.

OS: [e.g. Ubuntu] Ubuntu 16.04
GPU [e.g. 2080 Ti] NVIDIA Xavier

Additional context

Add any other context about the problem here.

Thanks in advance

github-actions[bot] commented 4 years ago

Hello @DennisFaucher, thank you for your interest in our work! Ultralytics has open-sourced YOLOv5 at https://github.com/ultralytics/yolov5, featuring faster, lighter and more accurate object detection. YOLOv5 is recommended for all new projects.

To continue with this repo, please visit our Custom Training Tutorial to get started, and see our Google Colab Notebook, Docker Image, and GCP Quickstart Guide for example environments.

If this is a bug report, please provide screenshots and minimum viable code to reproduce your issue, otherwise we can not help you.

If this is a custom model or data training question, please note that Ultralytics does not provide free personal support. As a leader in vision ML and AI, we do offer professional consulting, from simple expert advice up to delivery of fully customized, end-to-end production solutions for our clients, such as:

Cloud-based AI systems operating on hundreds of HD video streams in realtime.
Edge AI integrated into custom iOS and Android apps for realtime 30 FPS video inference.
Custom data training, hyperparameter evolution, and model exportation to any destination.

For more information please visit https://www.ultralytics.com.

DennisFaucher commented 4 years ago

Closing as I am an idiot. Forgot to edit the "classes" section of the .cfg. Training is running now.

glenn-jocher commented 4 years ago

@DennisFaucher yes this looks like a cfg issue. I'd recommend yolov5, as it has a significantly simpler custom training setup.

ultralytics / yolov3