two problems during training

xuk997 commented 2 years ago

在训练期间遇到的两个报错：

1、RuntimeError: cannot perform reduction function max on tensor with no elements because the operation does not have an identity。这个问题不是训练过程中百分之百触发的，之前按照同样的参数和数据集训练过两轮，完全没问题。问题原因是generalpy中non_max_suppression函数的 i, j = (x[:, 5:5+nc] > conf_thres).nonzero(as_tuple=False).T的i，j为空tensor。具体报错如下：

Starting training for 500 epochs...
     Epoch   gpu_mem       box       obj       cls     angle    labels  img_size
     0/499     3.03G   0.08573   0.00518   0.01223    0.1266        18       640: 100%|█| 147/147
     Epoch   gpu_mem       box       obj       cls     angle    labels  img_size
     1/499     3.03G   0.07211  0.004031 0.0007491   0.08157        18       640: 100%|███████████████████████████| 147/147 [01:23<00:00,  1.77it/s]
               Class     Images     Labels          P          R     mAP@.5 mAP@.5:.95:  95%|█████████████████████▊ | 18/19 [00:05<00:00,  3.25it/s]
Traceback (most recent call last):
  File "train.py", line 601, in <module>
    main(opt)
  File "train.py", line 499, in main
    train(opt.hyp, opt, device)
  File "train.py", line 353, in train
    results, maps, _ = val.run(data_dict,
  File "D:\Anaconda\envs\yolo-dota\lib\site-packages\torch\autograd\grad_mode.py", line 15, in decorate_context
    return func(*args, **kwargs)
  File "D:\Rotation-Detect-yolov5_poly-master\val.py", line 185, in run
    out = non_max_suppression(out, conf_thres, iou_thres, labels=lb, multi_label=True, agnostic=single_cls)
  File "D:\Rotation-Detect-yolov5_poly-master\utils\general.py", line 715, in non_max_suppression
    conf_angle, j_angle = x[i, 5+nc:].max(1, keepdim=True)
RuntimeError: cannot perform reduction function max on tensor with no elements because the operation does not have an identity

2、RuntimeError: The expanded size of the tensor (4) must match the existing size (2) at non-singleton dimension 0. Target sizes: [4, 6]. Tensor sizes: [2, 6]。之前训练时使用的是作者您的hyp.finetune_objects365.yaml配置文件。我进行参数调整训练后，百分之百会报这个错误。错误原因为：

  θ计算出现异常，当前数据为：296.6229553222656250, 614.3365478515625000, 0.0000000000000000, 29.1324348449707031, 180.0;超出opencv表示法的范围：[-90,0)
  θ计算出现异常，当前数据为：432.7497558593750000, 615.0087890625000000, 0.0000000000000000, 20.7464981079101562, 180.0;超出opencv表示法的范围：[-90,0)

参数配置为：lr0=0.001, lrf=0.17, momentum=0.779, weight_decay=0.00036, warmup_epochs=2, warmup_momentum=0.5, warmup_bias_lr=0.05, box=0.0296, cls=0.243, cls_pw=0.631, obj=0.301, obj_pw=0.911, angle=0.266, angle_pw=0.333, iou_t=0.2, anchor_t=3.44, anchors=3.2, fl_gamma=0.0, hsv_h=0.0188, hsv_s=0.704, hsv_v=0.36, degrees=0.0, translate=0.245, scale=0.898, shear=0.602, perspective=0.0, flipud=0.0, fliplr=0.5, mosaic=1.0, mixup=0.243, copy_paste=0.0 opencv-python版本为4.1.2.30

Starting training for 500 epochs...
     Epoch   gpu_mem       box       obj       cls     angle    labels  img_size
     0/499     3.05G   0.04776  0.004455    0.0354    0.2496        25       640:  20%|█████▌                      | 29/147 [00:25<01:43,  1.14it/s]
Traceback (most recent call last):
  File "train.py", line 601, in <module>
    main(opt)
  File "train.py", line 499, in main
    train(opt.hyp, opt, device)
  File "train.py", line 290, in train
    for i, (imgs, targets, paths, _) in pbar:  # batch -------------------------------------------------------------
  File "D:\Anaconda\envs\yolo-dota\lib\site-packages\tqdm\std.py", line 1180, in __iter__
    for obj in iterable:
  File "D:\Rotation-Detect-yolov5_poly-master\utils\datasets.py", line 314, in __iter__
    yield next(self.iterator)
  File "D:\Anaconda\envs\yolo-dota\lib\site-packages\torch\utils\data\dataloader.py", line 363, in __next__
    data = self._next_data()
  File "D:\Anaconda\envs\yolo-dota\lib\site-packages\torch\utils\data\dataloader.py", line 989, in _next_data
    return self._process_data(data)
  File "D:\Anaconda\envs\yolo-dota\lib\site-packages\torch\utils\data\dataloader.py", line 1014, in _process_data
    data.reraise()
  File "D:\Anaconda\envs\yolo-dota\lib\site-packages\torch\_utils.py", line 395, in reraise
    raise self.exc_type(msg)
RuntimeError: Caught RuntimeError in DataLoader worker process 0.
Original Traceback (most recent call last):
  File "D:\Anaconda\envs\yolo-dota\lib\site-packages\torch\utils\data\_utils\worker.py", line 185, in _worker_loop
    data = fetcher.fetch(index)
  File "D:\Anaconda\envs\yolo-dota\lib\site-packages\torch\utils\data\_utils\fetch.py", line 44, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "D:\Anaconda\envs\yolo-dota\lib\site-packages\torch\utils\data\_utils\fetch.py", line 44, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "D:\Rotation-Detect-yolov5_poly-master\utils\datasets.py", line 820, in __getitem__
    labels_out[:, 1:] = torch.from_numpy(labels_new[:, 0:6])
RuntimeError: The expanded size of the tensor (4) must match the existing size (2) at non-singleton dimension 0.  Target sizes: [4, 6].  Tensor sizes: [2, 6]

菜鸟只能发现问题，还无法解决。希望大佬赐教。

github-actions[bot] commented 2 years ago

👋 Hello @xuk997, thank you for your interest in YOLOv5 🚀! Please visit our ⭐️ Tutorials to get started, where you can find quickstart guides for simple tasks like Custom Data Training all the way to advanced concepts like Hyperparameter Evolution.

If this is a 🐛 Bug Report, please provide screenshots and minimum viable code to reproduce your issue, otherwise we can not help you.

If this is a custom training ❓ Question, please provide as much information as possible, including dataset images, training logs, screenshots, and a public link to online W&B logging if available.

For business inquiries or professional support requests please visit https://ultralytics.com or email Glenn Jocher at glenn.jocher@ultralytics.com.

Requirements

Python>=3.6.0 with all requirements.txt installed including PyTorch>=1.7. To get started:

$ git clone https://github.com/ultralytics/yolov5
$ cd yolov5
$ pip install -r requirements.txt

Environments

YOLOv5 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):

Google Colab and Kaggle notebooks with free GPU:
Google Cloud Deep Learning VM. See GCP Quickstart Guide
Amazon Deep Learning AMI. See AWS Quickstart Guide
Docker Image. See Docker Quickstart Guide

Status

If this badge is green, all YOLOv5 GitHub Actions Continuous Integration (CI) tests are currently passing. CI tests verify correct operation of YOLOv5 training (train.py), validation (val.py), inference (detect.py) and export (export.py) on MacOS, Windows, and Ubuntu every 24 hours and on every commit.

root12321 commented 2 years ago

在训练期间遇到的两个报错：

1、RuntimeError: cannot perform reduction function max on tensor with no elements because the operation does not have an identity。这个问题不是训练过程中百分之百触发的，之前按照同样的参数和数据集训练过两轮，完全没问题。问题原因是generalpy中non_max_suppression函数的 i, j = (x[:, 5:5+nc] > conf_thres).nonzero(as_tuple=False).T的i，j为空tensor。具体报错如下：

Starting training for 500 epochs...
     Epoch   gpu_mem       box       obj       cls     angle    labels  img_size
     0/499     3.03G   0.08573   0.00518   0.01223    0.1266        18       640: 100%|█| 147/147
     Epoch   gpu_mem       box       obj       cls     angle    labels  img_size
     1/499     3.03G   0.07211  0.004031 0.0007491   0.08157        18       640: 100%|███████████████████████████| 147/147 [01:23<00:00,  1.77it/s]
               Class     Images     Labels          P          R     mAP@.5 mAP@.5:.95:  95%|█████████████████████▊ | 18/19 [00:05<00:00,  3.25it/s]
Traceback (most recent call last):
  File "train.py", line 601, in <module>
    main(opt)
  File "train.py", line 499, in main
    train(opt.hyp, opt, device)
  File "train.py", line 353, in train
    results, maps, _ = val.run(data_dict,
  File "D:\Anaconda\envs\yolo-dota\lib\site-packages\torch\autograd\grad_mode.py", line 15, in decorate_context
    return func(*args, **kwargs)
  File "D:\Rotation-Detect-yolov5_poly-master\val.py", line 185, in run
    out = non_max_suppression(out, conf_thres, iou_thres, labels=lb, multi_label=True, agnostic=single_cls)
  File "D:\Rotation-Detect-yolov5_poly-master\utils\general.py", line 715, in non_max_suppression
    conf_angle, j_angle = x[i, 5+nc:].max(1, keepdim=True)
RuntimeError: cannot perform reduction function max on tensor with no elements because the operation does not have an identity

2、RuntimeError: The expanded size of the tensor (4) must match the existing size (2) at non-singleton dimension 0. Target sizes: [4, 6]. Tensor sizes: [2, 6]。之前训练时使用的是作者您的hyp.finetune_objects365.yaml配置文件。我进行参数调整训练后，百分之百会报这个错误。错误原因为：

  θ计算出现异常，当前数据为：296.6229553222656250, 614.3365478515625000, 0.0000000000000000, 29.1324348449707031, 180.0;超出opencv表示法的范围：[-90,0)
  θ计算出现异常，当前数据为：432.7497558593750000, 615.0087890625000000, 0.0000000000000000, 20.7464981079101562, 180.0;超出opencv表示法的范围：[-90,0)

参数配置为：lr0=0.001, lrf=0.17, momentum=0.779, weight_decay=0.00036, warmup_epochs=2, warmup_momentum=0.5, warmup_bias_lr=0.05, box=0.0296, cls=0.243, cls_pw=0.631, obj=0.301, obj_pw=0.911, angle=0.266, angle_pw=0.333, iou_t=0.2, anchor_t=3.44, anchors=3.2, fl_gamma=0.0, hsv_h=0.0188, hsv_s=0.704, hsv_v=0.36, degrees=0.0, translate=0.245, scale=0.898, shear=0.602, perspective=0.0, flipud=0.0, fliplr=0.5, mosaic=1.0, mixup=0.243, copy_paste=0.0 opencv-python版本为4.1.2.30

Starting training for 500 epochs...
     Epoch   gpu_mem       box       obj       cls     angle    labels  img_size
     0/499     3.05G   0.04776  0.004455    0.0354    0.2496        25       640:  20%|█████▌                      | 29/147 [00:25<01:43,  1.14it/s]
Traceback (most recent call last):
  File "train.py", line 601, in <module>
    main(opt)
  File "train.py", line 499, in main
    train(opt.hyp, opt, device)
  File "train.py", line 290, in train
    for i, (imgs, targets, paths, _) in pbar:  # batch -------------------------------------------------------------
  File "D:\Anaconda\envs\yolo-dota\lib\site-packages\tqdm\std.py", line 1180, in __iter__
    for obj in iterable:
  File "D:\Rotation-Detect-yolov5_poly-master\utils\datasets.py", line 314, in __iter__
    yield next(self.iterator)
  File "D:\Anaconda\envs\yolo-dota\lib\site-packages\torch\utils\data\dataloader.py", line 363, in __next__
    data = self._next_data()
  File "D:\Anaconda\envs\yolo-dota\lib\site-packages\torch\utils\data\dataloader.py", line 989, in _next_data
    return self._process_data(data)
  File "D:\Anaconda\envs\yolo-dota\lib\site-packages\torch\utils\data\dataloader.py", line 1014, in _process_data
    data.reraise()
  File "D:\Anaconda\envs\yolo-dota\lib\site-packages\torch\_utils.py", line 395, in reraise
    raise self.exc_type(msg)
RuntimeError: Caught RuntimeError in DataLoader worker process 0.
Original Traceback (most recent call last):
  File "D:\Anaconda\envs\yolo-dota\lib\site-packages\torch\utils\data\_utils\worker.py", line 185, in _worker_loop
    data = fetcher.fetch(index)
  File "D:\Anaconda\envs\yolo-dota\lib\site-packages\torch\utils\data\_utils\fetch.py", line 44, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "D:\Anaconda\envs\yolo-dota\lib\site-packages\torch\utils\data\_utils\fetch.py", line 44, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "D:\Rotation-Detect-yolov5_poly-master\utils\datasets.py", line 820, in __getitem__
    labels_out[:, 1:] = torch.from_numpy(labels_new[:, 0:6])
RuntimeError: The expanded size of the tensor (4) must match the existing size (2) at non-singleton dimension 0.  Target sizes: [4, 6].  Tensor sizes: [2, 6]

菜鸟只能发现问题，还无法解决。希望大佬赐教。

当前项目确实存在一些问题，目前正在debug，你可以先去试试readme中提到的其他两个旋转目标检测的项目

xuk997 commented 2 years ago

嗯嗯，谢谢~

github-actions[bot] commented 2 years ago

👋 Hello, this issue has been automatically marked as stale because it has not had recent activity. Please note it will be closed if no further activity occurs.

Access additional YOLOv5 🚀 resources:

Wiki – https://github.com/ultralytics/yolov5/wiki
Tutorials – https://github.com/ultralytics/yolov5#tutorials
Docs – https://docs.ultralytics.com

Access additional Ultralytics ⚡ resources:

Ultralytics HUB – https://ultralytics.com
Vision API – https://ultralytics.com/yolov5
About Us – https://ultralytics.com/about
Join Our Team – https://ultralytics.com/work
Contact Us – https://ultralytics.com/contact

Feel free to inform us of any other issues you discover or feature requests that come to mind in the future. Pull Requests (PRs) are also always welcomed!

Thank you for your contributions to YOLOv5 🚀 and Vision AI ⭐!

root12321 / Rotation-Detect-yolov5_poly