Closed lucasjinreal closed 4 years ago
@jinfagang thanks for the bug report. Do you have any guidance for reproducing this?
@glenn-jocher I just pulled the code with v3.0 release, it broken something. Seems relate with compute loss function, while root reason maybe some behavior change in loading target?
@jinfagang you may need to update your environment then, torch 1.6 is a requirement for v3.0 training and inference. All CI tests are passing, so all main functions are operating correctly. I'll paste you the default info below for this.
Python 3.8 or later with all requirements.txt dependencies installed, including torch>=1.6
. To install run:
$ pip install -r requirements.txt
If this badge is green, all YOLOv5 GitHub Actions Continuous Integration (CI) tests are passing. These tests evaluate proper operation of basic YOLOv5 functionality, including training (train.py), testing (test.py), inference (detect.py) and export (export.py) on MacOS, Windows, and Ubuntu.
@glenn-jocher I just cloned a newest version of code, I still got error like this:
Analyzing anchors... anchors/target = 5.41, Best Possible Recall (BPR) = 0.9991
Image sizes 800 train, 800 test
Using 8 dataloader workers
Starting training for 300 epochs...
Epoch gpu_mem GIoU obj cls total targets img_size
0%| | 0/4182 [00:00<?, ?it/s]Traceback (most recent call last):
File "train.py", line 458, in <module>
train(hyp, opt, device, tb_writer)
File "train.py", line 272, in train
loss, loss_items = compute_loss(pred, targets.to(device), model) # scaled by batch_size
File "/yolov5/utils/general.py", line 505, in compute_loss
pxy = ps[:, :2].sigmoid() * 2. - 0.5
IndexError: too many indices for tensor of dimension 1
0%| | 0/4182 [00:04<?, ?it/s]
@glenn-jocher The error comes from here:
balance = [4.0, 1.0, 0.4] if np == 3 else [4.0, 1.0, 0.4, 0.1] # P3-5 or P3-6
for i, pi in enumerate(p): # layer index, layer predictions
b, a, gj, gi = indices[i] # image, anchor, gridy, gridx
tobj = torch.zeros_like(pi[..., 0], device=device) # target obj
n = b.shape[0] # number of targets
if n:
nt += n # cumulative targets
ps = pi[b, a, gj, gi] # prediction subset corresponding to targets
print('ps shape: ', ps.shape)
# Regression
pxy = ps[:, :2].sigmoid() * 2. - 0.5 <----- ps index errror, printed out shape is [747] which is of course index error
pwh = (ps[:, 2:4].sigmoid() * 2) ** 2 * anchors[i]
p
comes from prediction of compute_loss, so, do u have any idea why your latest code have such behavior than previous version? It's not my dataset problem since my data can train on v1 and v2.0, with your latest update in 3.0, got this error
@jinfagang if this error is not reproducible there is nothing for us to do. If you can supply exact code to reproduce, we can get started looking at it.
@glenn-jocher I am not sure what's the problem is but I tried this step to reproduced, can u have a test? :
1. clone latest
git clone repo
2. donwload coco128, and train with:
Output is:
yolov5 on master ● ●
❯ python3 train.py --cfg models/yolov5s.yaml --data data/coco128.yaml --img-size 800
Using CUDA device0 _CudaDeviceProperties(name='GeForce GTX 1080 Ti', total_memory=11168MB)
Namespace(adam=False, batch_size=16, bucket='', cache_images=False, cfg='models/yolov5s.yaml', data='data/coco128.yaml', device='', epochs=300, evolve=False, global_rank=-1, hyp='data/hyp.finetune.yaml', img_size=[800, 800], local_rank=-1, logdir='runs/', multi_scale=False, name='', noautoanchor=False, nosave=False, notest=False, rect=False, resume=False, single_cls=False, sync_bn=False, total_batch_size=16, weights='yolov5s.pt', workers=8, world_size=1)
Start Tensorboard with "tensorboard --logdir runs/", view at http://localhost:6006/
Hyperparameters {'lr0': 0.01, 'momentum': 0.937, 'weight_decay': 0.0005, 'giou': 0.05, 'cls': 0.5, 'cls_pw': 1.0, 'obj': 1.0, 'obj_pw': 1.0, 'iou_t': 0.2, 'anchor_t': 4.0, 'fl_gamma': 0.0, 'hsv_h': 0.015, 'hsv_s': 0.7, 'hsv_v': 0.4, 'degrees': 0.0, 'translate': 0.1, 'scale': 0.5, 'shear': 0.0, 'perspective': 0.0, 'flipud': 0.0, 'fliplr': 0.5, 'mixup': 0.0}
from n params module arguments
0 -1 1 3520 models.common.Focus [3, 32, 3]
1 -1 1 18560 models.common.Conv [32, 64, 3, 2]
2 -1 1 19904 models.common.BottleneckCSP [64, 64, 1]
3 -1 1 73984 models.common.Conv [64, 128, 3, 2]
4 -1 1 161152 models.common.BottleneckCSP [128, 128, 3]
5 -1 1 295424 models.common.Conv [128, 256, 3, 2]
6 -1 1 641792 models.common.BottleneckCSP [256, 256, 3]
7 -1 1 1180672 models.common.Conv [256, 512, 3, 2]
8 -1 1 656896 models.common.SPP [512, 512, [5, 9, 13]]
9 -1 1 1248768 models.common.BottleneckCSP [512, 512, 1, False]
10 -1 1 131584 models.common.Conv [512, 256, 1, 1]
11 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
12 [-1, 6] 1 0 models.common.Concat [1]
13 -1 1 378624 models.common.BottleneckCSP [512, 256, 1, False]
14 -1 1 33024 models.common.Conv [256, 128, 1, 1]
15 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
16 [-1, 4] 1 0 models.common.Concat [1]
17 -1 1 95104 models.common.BottleneckCSP [256, 128, 1, False]
18 -1 1 147712 models.common.Conv [128, 128, 3, 2]
19 [-1, 14] 1 0 models.common.Concat [1]
20 -1 1 313088 models.common.BottleneckCSP [256, 256, 1, False]
21 -1 1 590336 models.common.Conv [256, 256, 3, 2]
22 [-1, 10] 1 0 models.common.Concat [1]
23 -1 1 1248768 models.common.BottleneckCSP [512, 512, 1, False]
24 [17, 20, 23] 1 229245 models.yolo.Detect [80, [[10, 13, 16, 30, 33, 23], [30, 61, 62, 45, 59, 119], [116, 90, 156, 198, 373, 326]], [128, 256, 512]]
Model Summary: 191 layers, 7.46816e+06 parameters, 7.46816e+06 gradients, 17.5 GFLOPS
Transferred 368/370 items from yolov5s.pt
Optimizer groups: 62 .bias, 70 conv.weight, 59 other
Scanning images: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 128/128 [00:00<00:00, 251.20it/s]
Scanning labels ../coco128/labels/train2017.cache (126 found, 0 missing, 2 empty, 0 duplicate, for 128 images): 128it [00:00, 5209.51it/s]
Scanning labels ../coco128/labels/train2017.cache (126 found, 0 missing, 2 empty, 0 duplicate, for 128 images): 128it [00:00, 13821.92it/s]
Analyzing anchors... anchors/target = 4.30, Best Possible Recall (BPR) = 0.9957
Image sizes 800 train, 800 test
Using 8 dataloader workers
Starting training for 300 epochs...
Epoch gpu_mem GIoU obj cls total targets img_size
0%| | 0/8 [00:00<?, ?it/s]ps shape: torch.Size([491])
Traceback (most recent call last):
File "train.py", line 458, in <module>
train(hyp, opt, device, tb_writer)
File "train.py", line 272, in train
loss, loss_items = compute_loss(pred, targets.to(device), model) # scaled by batch_size
File "/yolov5/utils/general.py", line 505, in compute_loss
pxy = ps[:, :2].sigmoid() * 2. - 0.5
IndexError: too many indices for tensor of dimension 1
0%|
@glenn-jocher I am using pytorch 1.6 BTW:
>>> torch.__version__
'1.6.0'
How u tried pytorch 1.6 compatibility since it's stable version now? Also HardSwish introduced in latest pytorch.
@glenn-jocher I can confirm there must be something wrong with latest pytorch...
@jinfagang I don't understand then. I tried your command in the notebook but everything works fine. It may be simple environment differences. You may want to create a new python 3.8 venv and install all requirerments and go from there.
By the way, datasets are downloaded automatically now on first use, so you do not need to manually download coco128 anymore :)
loss, loss_items = compute_loss(pred, targets.to(device), model) # scaled by batch_size File "D:\Myprogramme\horizon_yolov5-main\yolov5_v2.0_leaf\utils\utils.py", line 474, in compute_loss pxy = ps[:, :2].sigmoid() * 2. - 0.5 IndexError: too many indices for tensor of dimension 1
@joh2nor it appears you're encountering an indexing error in the compute_loss
function, suggesting that the ps
tensor does not have the expected dimensions. This often happens if your predictions or dataset are not correctly formatted or if there's an unexpected interaction with model outputs.
A quick fix you can try is to ensure your dataset is correctly loaded and matches the expected input format for YOLOv5. Also, verify that your model architecture is correctly set up and aligns with the inputs you're providing.
If the issue persists, it's possible there could be a version mismatch or an environmental issue. Make sure you're running a compatible version of PyTorch and have followed the setup instructions closely. You might also want to try updating to the latest version of YOLOv5 and see if the issue has been addressed in a more recent release.
# Ensure your setup matches the requirements
import torch
print(torch.__version__) # Check PyTorch version, make sure it's compatible
If you're still stuck, could you please provide more details about your setup and the steps leading to this error? A snippet of code that includes how you're loading your dataset and initializing your model might help pinpoint the issue. 🤓