ultralytics / yolov5

YOLOv5 🚀 in PyTorch > ONNX > CoreML > TFLite
https://docs.ultralytics.com
GNU Affero General Public License v3.0
50.78k stars 16.36k forks source link

/yolov5/utils/general.py", line 520, in compute_loss pxy = ps[:, :2].sigmoid() * 2. - 0.5 IndexError: too many indices for tensor of dimension 1 0%| #774

Closed lucasjinreal closed 4 years ago

lucasjinreal commented 4 years ago
/yolov5/utils/general.py", line 520, in compute_loss
    pxy = ps[:, :2].sigmoid() * 2. - 0.5
IndexError: too many indices for tensor of dimension 1
  0%|                                                       
glenn-jocher commented 4 years ago

@jinfagang thanks for the bug report. Do you have any guidance for reproducing this?

lucasjinreal commented 4 years ago

@glenn-jocher I just pulled the code with v3.0 release, it broken something. Seems relate with compute loss function, while root reason maybe some behavior change in loading target?

glenn-jocher commented 4 years ago

@jinfagang you may need to update your environment then, torch 1.6 is a requirement for v3.0 training and inference. All CI tests are passing, so all main functions are operating correctly. I'll paste you the default info below for this.

Requirements

Python 3.8 or later with all requirements.txt dependencies installed, including torch>=1.6. To install run:

$ pip install -r requirements.txt

Status

CI CPU testing

If this badge is green, all YOLOv5 GitHub Actions Continuous Integration (CI) tests are passing. These tests evaluate proper operation of basic YOLOv5 functionality, including training (train.py), testing (test.py), inference (detect.py) and export (export.py) on MacOS, Windows, and Ubuntu.

lucasjinreal commented 4 years ago

@glenn-jocher I just cloned a newest version of code, I still got error like this:

Analyzing anchors... anchors/target = 5.41, Best Possible Recall (BPR) = 0.9991
Image sizes 800 train, 800 test
Using 8 dataloader workers
Starting training for 300 epochs...

     Epoch   gpu_mem      GIoU       obj       cls     total   targets  img_size
  0%|                                                                                                                                                                                   | 0/4182 [00:00<?, ?it/s]Traceback (most recent call last):
  File "train.py", line 458, in <module>
    train(hyp, opt, device, tb_writer)
  File "train.py", line 272, in train
    loss, loss_items = compute_loss(pred, targets.to(device), model)  # scaled by batch_size
  File "/yolov5/utils/general.py", line 505, in compute_loss
    pxy = ps[:, :2].sigmoid() * 2. - 0.5
IndexError: too many indices for tensor of dimension 1
  0%|                                                                                                                                                                                   | 0/4182 [00:04<?, ?it/s]
lucasjinreal commented 4 years ago

@glenn-jocher The error comes from here:

 balance = [4.0, 1.0, 0.4] if np == 3 else [4.0, 1.0, 0.4, 0.1]  # P3-5 or P3-6
    for i, pi in enumerate(p):  # layer index, layer predictions
        b, a, gj, gi = indices[i]  # image, anchor, gridy, gridx
        tobj = torch.zeros_like(pi[..., 0], device=device)  # target obj

        n = b.shape[0]  # number of targets
        if n:
            nt += n  # cumulative targets
            ps = pi[b, a, gj, gi]  # prediction subset corresponding to targets
            print('ps shape: ', ps.shape)
            # Regression
            pxy = ps[:, :2].sigmoid() * 2. - 0.5  <----- ps index errror, printed out shape is [747] which is of course index error
            pwh = (ps[:, 2:4].sigmoid() * 2) ** 2 * anchors[i]

p comes from prediction of compute_loss, so, do u have any idea why your latest code have such behavior than previous version? It's not my dataset problem since my data can train on v1 and v2.0, with your latest update in 3.0, got this error

glenn-jocher commented 4 years ago

@jinfagang if this error is not reproducible there is nothing for us to do. If you can supply exact code to reproduce, we can get started looking at it.

lucasjinreal commented 4 years ago

@glenn-jocher I am not sure what's the problem is but I tried this step to reproduced, can u have a test? :

1. clone latest
git clone repo

2. donwload coco128, and train with:

Output is:

yolov5 on  master ● ●
❯ python3 train.py --cfg models/yolov5s.yaml  --data data/coco128.yaml --img-size 800
Using CUDA device0 _CudaDeviceProperties(name='GeForce GTX 1080 Ti', total_memory=11168MB)

Namespace(adam=False, batch_size=16, bucket='', cache_images=False, cfg='models/yolov5s.yaml', data='data/coco128.yaml', device='', epochs=300, evolve=False, global_rank=-1, hyp='data/hyp.finetune.yaml', img_size=[800, 800], local_rank=-1, logdir='runs/', multi_scale=False, name='', noautoanchor=False, nosave=False, notest=False, rect=False, resume=False, single_cls=False, sync_bn=False, total_batch_size=16, weights='yolov5s.pt', workers=8, world_size=1)
Start Tensorboard with "tensorboard --logdir runs/", view at http://localhost:6006/
Hyperparameters {'lr0': 0.01, 'momentum': 0.937, 'weight_decay': 0.0005, 'giou': 0.05, 'cls': 0.5, 'cls_pw': 1.0, 'obj': 1.0, 'obj_pw': 1.0, 'iou_t': 0.2, 'anchor_t': 4.0, 'fl_gamma': 0.0, 'hsv_h': 0.015, 'hsv_s': 0.7, 'hsv_v': 0.4, 'degrees': 0.0, 'translate': 0.1, 'scale': 0.5, 'shear': 0.0, 'perspective': 0.0, 'flipud': 0.0, 'fliplr': 0.5, 'mixup': 0.0}

                 from  n    params  module                                  arguments                     
  0                -1  1      3520  models.common.Focus                     [3, 32, 3]                    
  1                -1  1     18560  models.common.Conv                      [32, 64, 3, 2]                
  2                -1  1     19904  models.common.BottleneckCSP             [64, 64, 1]                   
  3                -1  1     73984  models.common.Conv                      [64, 128, 3, 2]               
  4                -1  1    161152  models.common.BottleneckCSP             [128, 128, 3]                 
  5                -1  1    295424  models.common.Conv                      [128, 256, 3, 2]              
  6                -1  1    641792  models.common.BottleneckCSP             [256, 256, 3]                 
  7                -1  1   1180672  models.common.Conv                      [256, 512, 3, 2]              
  8                -1  1    656896  models.common.SPP                       [512, 512, [5, 9, 13]]        
  9                -1  1   1248768  models.common.BottleneckCSP             [512, 512, 1, False]          
 10                -1  1    131584  models.common.Conv                      [512, 256, 1, 1]              
 11                -1  1         0  torch.nn.modules.upsampling.Upsample    [None, 2, 'nearest']          
 12           [-1, 6]  1         0  models.common.Concat                    [1]                           
 13                -1  1    378624  models.common.BottleneckCSP             [512, 256, 1, False]          
 14                -1  1     33024  models.common.Conv                      [256, 128, 1, 1]              
 15                -1  1         0  torch.nn.modules.upsampling.Upsample    [None, 2, 'nearest']          
 16           [-1, 4]  1         0  models.common.Concat                    [1]                           
 17                -1  1     95104  models.common.BottleneckCSP             [256, 128, 1, False]          
 18                -1  1    147712  models.common.Conv                      [128, 128, 3, 2]              
 19          [-1, 14]  1         0  models.common.Concat                    [1]                           
 20                -1  1    313088  models.common.BottleneckCSP             [256, 256, 1, False]          
 21                -1  1    590336  models.common.Conv                      [256, 256, 3, 2]              
 22          [-1, 10]  1         0  models.common.Concat                    [1]                           
 23                -1  1   1248768  models.common.BottleneckCSP             [512, 512, 1, False]          
 24      [17, 20, 23]  1    229245  models.yolo.Detect                      [80, [[10, 13, 16, 30, 33, 23], [30, 61, 62, 45, 59, 119], [116, 90, 156, 198, 373, 326]], [128, 256, 512]]
Model Summary: 191 layers, 7.46816e+06 parameters, 7.46816e+06 gradients, 17.5 GFLOPS

Transferred 368/370 items from yolov5s.pt
Optimizer groups: 62 .bias, 70 conv.weight, 59 other
Scanning images: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 128/128 [00:00<00:00, 251.20it/s]
Scanning labels ../coco128/labels/train2017.cache (126 found, 0 missing, 2 empty, 0 duplicate, for 128 images): 128it [00:00, 5209.51it/s]
Scanning labels ../coco128/labels/train2017.cache (126 found, 0 missing, 2 empty, 0 duplicate, for 128 images): 128it [00:00, 13821.92it/s]

Analyzing anchors... anchors/target = 4.30, Best Possible Recall (BPR) = 0.9957
Image sizes 800 train, 800 test
Using 8 dataloader workers
Starting training for 300 epochs...

     Epoch   gpu_mem      GIoU       obj       cls     total   targets  img_size
  0%|                                                                                                                                                                                      | 0/8 [00:00<?, ?it/s]ps shape:  torch.Size([491])
Traceback (most recent call last):
  File "train.py", line 458, in <module>
    train(hyp, opt, device, tb_writer)
  File "train.py", line 272, in train
    loss, loss_items = compute_loss(pred, targets.to(device), model)  # scaled by batch_size
  File "/yolov5/utils/general.py", line 505, in compute_loss
    pxy = ps[:, :2].sigmoid() * 2. - 0.5
IndexError: too many indices for tensor of dimension 1
  0%|                                                     
lucasjinreal commented 4 years ago

@glenn-jocher I am using pytorch 1.6 BTW:

>>> torch.__version__
'1.6.0'

How u tried pytorch 1.6 compatibility since it's stable version now? Also HardSwish introduced in latest pytorch.

lucasjinreal commented 4 years ago

@glenn-jocher I can confirm there must be something wrong with latest pytorch...

glenn-jocher commented 4 years ago

@jinfagang I don't understand then. I tried your command in the notebook but everything works fine. It may be simple environment differences. You may want to create a new python 3.8 venv and install all requirerments and go from there.

By the way, datasets are downloaded automatically now on first use, so you do not need to manually download coco128 anymore :)

Screen Shot 2020-08-19 at 1 16 50 AM
joh2nor commented 7 months ago

loss, loss_items = compute_loss(pred, targets.to(device), model) # scaled by batch_size File "D:\Myprogramme\horizon_yolov5-main\yolov5_v2.0_leaf\utils\utils.py", line 474, in compute_loss pxy = ps[:, :2].sigmoid() * 2. - 0.5 IndexError: too many indices for tensor of dimension 1

glenn-jocher commented 7 months ago

@joh2nor it appears you're encountering an indexing error in the compute_loss function, suggesting that the ps tensor does not have the expected dimensions. This often happens if your predictions or dataset are not correctly formatted or if there's an unexpected interaction with model outputs.

A quick fix you can try is to ensure your dataset is correctly loaded and matches the expected input format for YOLOv5. Also, verify that your model architecture is correctly set up and aligns with the inputs you're providing.

If the issue persists, it's possible there could be a version mismatch or an environmental issue. Make sure you're running a compatible version of PyTorch and have followed the setup instructions closely. You might also want to try updating to the latest version of YOLOv5 and see if the issue has been addressed in a more recent release.

# Ensure your setup matches the requirements
import torch
print(torch.__version__)  # Check PyTorch version, make sure it's compatible

If you're still stuck, could you please provide more details about your setup and the steps leading to this error? A snippet of code that includes how you're loading your dataset and initializing your model might help pinpoint the issue. 🤓