Customize the data of the training single class

starsky68 commented 4 years ago

❔Question

For single-class data, this bug will appear

Additional context

File "D:\Python\lib\site-packages\torch\nn\modules\module.py", line 550, in call result = self.forward(*input, **kwargs) File "D:\learning\pythonWorkspace\PycharmProjects\yolov5-master\models\yolo.py", line 26, in forward x[i] = x[i].view(bs, self.na, self.no, ny, nx).permute(0, 1, 3, 4, 2).contiguous() RuntimeError: shape '[1, 3, 6, 48, 80]' is invalid for input of size 983040

github-actions[bot] commented 4 years ago

Hello @starsky68, thank you for your interest in our work! Please visit our Custom Training Tutorial to get started, and see our Jupyter Notebook , Docker Image, and Google Cloud Quickstart Guide for example environments.

If this is a bug report, please provide screenshots and minimum viable code to reproduce your issue, otherwise we can not help you.

If this is a custom model or data training question, please note Ultralytics does not provide free personal support. As a leader in vision ML and AI, we do offer professional consulting, from simple expert advice up to delivery of fully customized, end-to-end production solutions for our clients, such as:

Cloud-based AI systems operating on hundreds of HD video streams in realtime.
Edge AI integrated into custom iOS and Android apps for realtime 30 FPS video inference.
Custom data training, hyperparameter evolution, and model exportation to any destination.

For more information please visit https://www.ultralytics.com.

glenn-jocher commented 4 years ago

@starsky68 I tried to reproduce this. Was not able to produce your error message, but I did see a leaf variable error message regarding lcls. I've just pushed a fix for this, and single class training now operates correctly. Please git pull and try again.

starsky68 commented 4 years ago

@starsky68 I tried to reproduce this. Was not able to produce your error message, but I did see a leaf variable error message regarding lcls. I've just pushed a fix for this, and single class training now operates correctly. Please git pull and try again.

thanks for your help

treszkai commented 4 years ago

Dear @glenn-jocher,

I also have a similar problem with single-class training (with no preset weights):

Scanning labels ../data/labels.cache (284 found, 0 missing,
Scanning labels ../data/labels.cache (284 found, 0 missing,

Analyzing anchors... anchors/target = 0.85, Best Possible Recall (BPR) = 0.8521. Attempting to generate improved anchors, please wait...
Running kmeans for 9 anchors on 284 points...
thr=0.25: 1.0000 best possible recall, 9.00 anchors past thr
n=9, img_size=640, metric_all=0.735/0.994-mean/best, past_thr=0.735-mean: 390,14,  390,23,  391,24,  390,28,  390,31,  390,40
Evolving anchors with Genetic Algorithm: fitness = 0.9950: 100%|█| 1000/1000 [00
thr=0.25: 1.0000 best possible recall, 9.00 anchors past thr
n=9, img_size=640, metric_all=0.735/0.995-mean/best, past_thr=0.735-mean: 390,14,  390,23,  391,24,  391,28,  390,31,  390,40
Traceback (most recent call last):
  File "train.py", line 448, in <module>
    train(hyp, opt, device, tb_writer)
  File "train.py", line 192, in train
    check_anchors(dataset, model=model, thr=hyp['anchor_t'], imgsz=imgsz)
  File "/home/laszlo/dev/yolov5/utils/general.py", line 101, in check_anchors
    m.anchor_grid[:] = new_anchors.clone().view_as(m.anchor_grid)  # for inference
RuntimeError: shape '[3, 1, 3, 1, 1, 2]' is invalid for input of size 12

... where the last number 12 is sometimes 14 or 16, depending on number of images. I'll try the workaround of defining a dummy class, to have a multiclass problem.

glenn-jocher commented 4 years ago

@treszkai that's pretty funny, you have some pretty uniform objects in your dataset. Unfortunately we can only act on reproducible errors, so if you can write up a short google colab notebook that we can run to reproduce this, we can get started debugging it. Otherwise there's nothing we can do. I'll paste you some additional debugging information below:

Please note that most technical problems are due to:

Your changes to the default repository. If your issue is not reproducible in a new git clone version of this repository we can not debug it. Before going further run this code and ensure your issue persists:
```
sudo rm -rf yolov5  # remove existing
git clone https://github.com/ultralytics/yolov5 && cd yolov5 # clone latest
python detect.py  # verify detection
# CODE TO REPRODUCE YOUR ISSUE HERE
```
Your custom data. If your issue is not reproducible with COCO or COCO128 data we can not debug it. Visit our Custom Training Tutorial for guidelines on training your custom data. Examine train_batch0.jpg and test_batch0.jpg for a sanity check of training and testing data.
Your environment. If your issue is not reproducible in one of the verified environments below we can not debug it. If you are running YOLOv5 locally, ensure your environment meets all of the requirements.txt dependencies specified below.

If none of these apply to you, we suggest you close this issue and raise a new one using the Bug Report template, providing screenshots and minimum viable code to reproduce your issue. Thank you!

Requirements

Python 3.8 or later with all requirements.txt dependencies installed, including torch>=1.6. To install run:

$ pip install -r requirements.txt

Environments

YOLOv5 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):

Google Colab Notebook with free GPU:
Kaggle Notebook with free GPU: https://www.kaggle.com/models/ultralytics/yolov5
Google Cloud Deep Learning VM. See GCP Quickstart Guide
Docker Image https://hub.docker.com/r/ultralytics/yolov5. See Docker Quickstart Guide

Status

If this badge is green, all YOLOv5 GitHub Actions Continuous Integration (CI) tests are passing. These tests evaluate proper operation of basic YOLOv5 functionality, including training (train.py), testing (test.py), inference (detect.py) and export (export.py) on MacOS, Windows, and Ubuntu.

treszkai commented 4 years ago

Thanks for the pointer, the data is indeed very uniform. I'll try again when I have some more variety.

glenn-jocher commented 4 years ago

Another easy option is, you see the shapes of your objects very clearly in your printscreen, you can just put those into your model yaml by hand.

You can also skip autoanchor entirely with python train.py --noautoanchor, but with the default anchors giving you a BPR of 0.85, you're mAP will never exceed 0.85.

treszkai commented 4 years ago

Thanks for the quick and detailed response.

you have some pretty uniform objects in your dataset.

And wow, you have good eyes for this!

glenn-jocher commented 4 years ago

@treszkai I just spotted the problem. It looks like the scipy kmeans function we use for an initial evolution starting point will return less points than requested when the data is very similar. So you asked it for 9, and it returned 6 for example. That's just the immediate cause of your bug though, a much deeper issue is how to handle anchors correctly for varying or for not varying receptive fields.

Like I said your immediate solution is probably just to turn autoanchor off, and to plug in those values into your model.yaml file. Though I would set the small anchors (P3) to all zeros since they have low receptive field overlap with your objects.

github-actions[bot] commented 4 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

ultralytics / yolov5