validation with val.py fails with a indexing error

Search before asking

[X] I have searched the YOLOv5 issues and found no similar bug report.

YOLOv5 Component

Validation

Bug

I currently have the problem that, if I train a yolo model on a custom dataset, everything works as expected, the model is saved, results get plotted and synced to wandb. But as soon as I try to run the validation seperately with val.py it fails with an index error:

YOLOv5m summary: 369 layers, 21190557 parameters, 0 gradients, 49.1 GFLOPs
WARNING: --img-size 1360 must be multiple of max stride 32, updating to 1376
val: Scanning '/home/leon/studienarbeit/EvalFramework/data/datasets/gtsdb/yolo/val/labels.cache' images and labels..
               Class     Images     Labels          P          R     mAP@.5 mAP@.5:.95:  38%|███▊      | 23/60 [00:0
Traceback (most recent call last):
  File "/home/leon/studienarbeit/EvalFramework/train_yolo.py", line 13, in <module>
    yoloV5.val(dataset=gtsdb_dataset, batch_size=2, weights='yolov5m.pt', img_size=1360)
  File "/home/leon/studienarbeit/EvalFramework/data/models/yolov5/__init__.py", line 25, in val
    yolo_val.run(data=yml, batch_size=batch_size, imgsz=img_size, device=device, weights=weights, task=task)
  File "/home/leon/miniconda3/envs/evalFramework/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 28, in decorate_context
    return func(*args, **kwargs)
  File "/home/leon/studienarbeit/EvalFramework/data/models/yolov5/yolov5_git/val.py", line 240, in run
    confusion_matrix.process_batch(predn, labelsn)
  File "/home/leon/studienarbeit/EvalFramework/data/models/yolov5/yolov5_git/utils/metrics.py", line 156, in process_batch
    self.matrix[detection_classes[m1[j]], gc] += 1  # correct
IndexError: index 74 is out of bounds for axis 0 with size 44

For reference: the dataset yml:

# Train/val/test sets as 1) dir: path/to/imgs, 2) file: path/to/imgs.txt, or 3) list: [path/to/imgs1, path/to/imgs2, ..]
path: ../../../datasets/gtsdb  # dataset root dir
train: yolo/train/images/  # train images (relative to 'path') 118287 images
test: yolo/test/images/  # 20288 of 40670 images, submit to https://competitions.codalab.org/competitions/20794
val: yolo/val/images/

# Classes
nc: 43  # number of classes
names: [ 'Geschwindigkeitsbegrenzung 20 km/h', 'Geschwindigkeitsbegrenzung 30 km/h', 'Geschwindigkeitsbegrenzung 50 km/h', 'Geschwindigkeitsbegrenzung 60 km/h', 'Geschwindigkeitsbegrenzung 70 km/h', 'Geschwindigkeitsbegrenzung 80 km/h', 'Geschwindigkeitsbegrenzung aufgehoben', 'Geschwindigkeitsbegrenzung 100 km/h', 'Geschwindigkeitsbegrenzung 120 km/h',  'Überholverbot',  'Überholverbot für LKW', 'Vorfahrt',  'Vorfahrtsstraße', 'Vorfahrt gewähren', 'Stopschild',  'Verbot für Fahrzeuge aller Art', 'Verbot für LKW',  'Verbot der Einfahrt',  'Gefahrstelle',  'Kurve links',  'Kurve rechts',  'Doppelkuve',  'Unebene Fahrbahn', 'Schleudergefahr',  'Einseitig verengte Fahrbahn', 'Baustelle', 'Lichtzeichenanlage',  'Fußgängerüberweg', 'spielende Kinder',  'Radfahrer kreuzen', 'Schnee- und Eisglätte', 'Achtung Wildwechsel', 'Alle Streckenverbote aufgehoben', 'Vorgeschriebene Fahrtrichtung rechts',  'Vorgeschriebene Fahrtrichtung links',  'Vorgeschriebene Fahrtrichtung geradeaus', 'Vorgeschriebene Fahrtrichtung geradeaus oder rechts', 'Vorgeschriebene Fahrtrichtung geradeaus oder links', 'Rechts vorbei', 'links vorbei', 'Kreisverkehr',  'Überholverbot aufgehoben',  'Überholverbot für LKW aufgehoben' ]  # class names

I'm calling val.py in a python file as follows:

yml = f"{self.__location__}/yolov5_git/data/{dataset.dataset_id.lower()}.yaml"
batch_size = 2
weights = 'yolov5m.pt'
img_size=1360
device = 0
task = 'val'
yolo_val.run(data=yml, batch_size=batch_size, imgsz=img_size, device=device, weights=weights, task=task)

The prepared dataset looks as follows: Screenshot_20220405_125253 Screenshot_20220405_125356 Screenshot_20220405_125433 with the following structure in every label text file: 4 0.6889705882352941 0.5975 0.01764705882352935 0.030000000000000027 if there are multiple objects in the image, then the next object annotation is written to the next line:

23 0.5422794117647058 0.5549999999999999 0.02132352941176474 0.032500000000000084
2 0.5422794117647058 0.58375 0.015441176470588291 0.02499999999999991
9 0.5426470588235295 0.608125 0.016176470588235348 0.026249999999999996
23 0.32389705882352937 0.551875 0.022794117647058798 0.03374999999999995
2 0.3242647058823529 0.5800000000000001 0.016176470588235292 0.02749999999999997
9 0.32499999999999996 0.60625 0.01470588235294118 0.025000000000000022

The train split includes 720 images, the val and test split include 120 images. I checked all annotations, as this was my first guess but all their classes are correct.

Environment

YoloV5: YOLOv5 🚀 v6.0-392-g0a20c80 torch 1.10.0+cu102 CUDA:0 (NVIDIA GeForce RTX 2080 Ti, 11019MiB)
Available GPUs: 2x NVIDIA GeForce RTX 2080 Ti, 11019MiB
CPU: Intel(R) Xeon(R) CPU E5-2650 v4 @ 2.20GHz
RAM: 64GB
OS:
- Distributor ID: Ubuntu
- Description: Ubuntu 20.04.4 LTS
- Release: 20.04
- Codename: focal
Nvidia driver Version: 495.29.05
Cuda Version: 11.5
Python Environment: Conda:
dependencies:
- _libgcc_mutex=0.1=main
- _openmp_mutex=4.5=1_gnu
- ca-certificates=2021.10.26=h06a4308_2
- certifi=2021.10.8=py39h06a4308_2
- ld_impl_linux-64=2.35.1=h7274673_9
- libffi=3.3=he6710b0_2
- libgcc-ng=9.3.0=h5101ec6_17
- libgomp=9.3.0=h5101ec6_17
- libstdcxx-ng=9.3.0=hd4cf53a_17
- ncurses=6.3=h7f8727e_2
- openssl=1.1.1m=h7f8727e_0
- pip=21.2.4=py39h06a4308_0
- python=3.9.7=h12debd9_1
- readline=8.1.2=h7f8727e_1
- setuptools=58.0.4=py39h06a4308_0
- sqlite=3.37.2=hc218d9a_0
- tk=8.6.11=h1ccaba5_0
- tzdata=2021e=hda174b7_0
- wheel=0.37.1=pyhd3eb1b0_0
- xz=5.2.5=h7b6447c_0
- zlib=1.2.11=h7f8727e_4
- pip:
- absl-py==1.0.0
- cachetools==5.0.0
- charset-normalizer==2.0.7
- click==8.0.4
- cycler==0.11.0
- docker-pycreds==0.4.0
- fonttools==4.28.2
- gitdb==4.0.9
- gitpython==3.1.27
- google-auth==2.6.0
- google-auth-oauthlib==0.4.6
- grpcio==1.44.0
- idna==3.3
- importlib-metadata==4.11.1
- kaggle==1.5.12
- kiwisolver==1.3.2
- markdown==3.3.6
- matplotlib==3.5.0
- numpy==1.21.4
- oauthlib==3.2.0
- opencv-python==4.5.4.60
- packaging==21.3
- pandas==1.3.4
- pathtools==0.1.2
- pillow==8.4.0
- promise==2.3
- protobuf==3.19.4
- psutil==5.9.0
- pyasn1==0.4.8
- pyasn1-modules==0.2.8
- pycocotools==2.0.4
- pyparsing==3.0.6
- python-dateutil==2.8.2
- python-slugify==5.0.2
- pytz==2021.3
- pyyaml==6.0
- requests==2.26.0
- requests-oauthlib==1.3.1
- rsa==4.8
- scipy==1.7.2
- seaborn==0.11.2
- sentry-sdk==1.5.6
- setuptools-scm==6.3.2
- shortuuid==1.0.8
- six==1.16.0
- smmap==5.0.0
- tensorboard==2.8.0
- tensorboard-data-server==0.6.1
- tensorboard-plugin-wit==1.8.1
- termcolor==1.1.0
- text-unidecode==1.3
- thop==0.0.31-2005241907
- tomli==1.2.2
- torch==1.10.0
- torchvision==0.11.1
- tqdm==4.62.3
- typing-extensions==4.0.0
- urllib3==1.26.7
- wandb==0.12.10
- werkzeug==2.0.3
- yaspin==2.1.0
- zipp==3.7.0

Minimal Reproducible Example

See bug report above

Additional

Any help would be appreciated! If more information is needed, feel free to ask.

Are you willing to submit a PR?

[ ] Yes I'd like to help by submitting a PR!

👋 Hello @lkno0705, thank you for your interest in YOLOv5 🚀! Please visit our ⭐️ Tutorials to get started, where you can find quickstart guides for simple tasks like Custom Data Training all the way to advanced concepts like Hyperparameter Evolution.

If this is a 🐛 Bug Report, please provide screenshots and minimum viable code to reproduce your issue, otherwise we can not help you.

If this is a custom training ❓ Question, please provide as much information as possible, including dataset images, training logs, screenshots, and a public link to online W&B logging if available.

For business inquiries or professional support requests please visit https://ultralytics.com or email support@ultralytics.com.

Requirements

Python>=3.7.0 with all requirements.txt installed including PyTorch>=1.7. To get started:

git clone https://github.com/ultralytics/yolov5  # clone
cd yolov5
pip install -r requirements.txt  # install

Environments

YOLOv5 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):

Google Colab and Kaggle notebooks with free GPU:
Google Cloud Deep Learning VM. See GCP Quickstart Guide
Amazon Deep Learning AMI. See AWS Quickstart Guide
Docker Image. See Docker Quickstart Guide

Status

If this badge is green, all YOLOv5 GitHub Actions Continuous Integration (CI) tests are currently passing. CI tests verify correct operation of YOLOv5 training (train.py), validation (val.py), inference (detect.py) and export (export.py) on MacOS, Windows, and Ubuntu every 24 hours and on every commit.

@lkno0705 we don't assist in debugging custom code, but you can start from the official usage example shown in val.py. If you encounter any reproducible errors following the official usage example please let us know!

https://github.com/ultralytics/yolov5/blob/5f97001ed4e5deb5c92eb200a79b5cb9da861130/val.py#L5-L18

@glenn-jocher That's understandable. However, the problem also occurs when using the example command in val.py:

python val.py --weights yolov5m.pt --data gtsdb.yaml --img 1360
wandb: Currently logged in as: ***** (use `wandb login --relogin` to force relogin)
val: data=/home/leon/studienarbeit/EvalFramework/data/models/yolov5/yolov5_git/data/gtsdb.yaml, weights=['yolov5m.pt'], batch_size=32, imgsz=1360, conf_thres=0.001, iou_thres=0.6, task=val, device=, workers=8, single_cls=False, augment=False, verbose=False, save_txt=False, save_hybrid=False, save_conf=False, save_json=False, project=runs/val, name=exp, exist_ok=False, half=False, dnn=False
YOLOv5 🚀 v6.0-392-g0a20c80 torch 1.10.0+cu102 CUDA:0 (NVIDIA GeForce RTX 2080 Ti, 11019MiB)

Fusing layers... 
YOLOv5m summary: 369 layers, 21190557 parameters, 0 gradients, 49.1 GFLOPs
WARNING: --img-size 1360 must be multiple of max stride 32, updating to 1376
val: Scanning '/home/leon/studienarbeit/EvalFramework/data/datasets/gtsdb/yolo/val/labels.cache' images and labels..
               Class     Images     Labels          P          R     mAP@.5 mAP@.5:.95:  25%|██▌       | 1/4 [00:04<
Traceback (most recent call last):
  File "/home/leon/studienarbeit/EvalFramework/val.py", line 390, in <module>
    main(opt)
  File "/home/leon/studienarbeit/EvalFramework/val.py", line 363, in main
    run(**vars(opt))
  File "/home/leon/miniconda3/envs/evalFramework/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 28, in decorate_context
    return func(*args, **kwargs)
  File "/home/leon/studienarbeit/EvalFramework/val.py", line 240, in run
    confusion_matrix.process_batch(predn, labelsn)
  File "/home/leon/studienarbeit/EvalFramework/data/models/yolov5/yolov5_git/utils/metrics.py", line 156, in process_batch
    self.matrix[detection_classes[m1[j]], gc] += 1  # correct
IndexError: index 74 is out of bounds for axis 0 with size 44

@lkno0705 thanks for the update! It looks like you are passing incompatible combination of --weights and --data. yolov5m.pt is trained on the COCO dataset, you can not validate it on anything other than the COCO dataset.

@glenn-jocher Yeiks, that mistake was so dumb.. Thanks for the hint! It works now... It makes sense that if you select the correct model to validate, that then the prediction classes are matching and then no index error while validating occurs... Thanks a lot! It seems like I've looked at my screen for too long today! Have a great day!

@lkno0705 good news 😃! Your original issue may now be improved ✅ in PR #7292. This PR adds better error handling for more informative error messages to help users self-diagnose the problem better.

To receive this update:

Git – git pull from within your yolov5/ directory or git clone https://github.com/ultralytics/yolov5 again
PyTorch Hub – Force-reload model = torch.hub.load('ultralytics/yolov5', 'yolov5s', force_reload=True)
Notebooks – View updated notebooks
Docker – sudo docker pull ultralytics/yolov5:latest to update your image

Thank you for spotting this issue and informing us of the problem. Please let us know if this update resolves the issue for you, and feel free to inform us of any other issues you discover or feature requests that come to mind. Happy trainings with YOLOv5 🚀!

ultralytics / yolov5