positive666 / yolo_research

based on yolo-high-level project (detect\pose\classify\segment\):include yolov5\yolov7\yolov8\ core ,improvement research ,SwintransformV2 and Attention Series. training skills, business customization, engineering deployment C
GNU General Public License v3.0
758 stars 146 forks source link

RuntimeError: shape mismatch: value tensor of shape [290, 1] cannot be broadcast to indexing result of shape [290] #60

Closed IncubatorShokuhou closed 2 years ago

IncubatorShokuhou commented 2 years ago

loss.py里没有import torch.nn.functional as F.然后这个修完了之后还有问题(数据集是coco128):

Traceback (most recent call last):
  File "/mnt/data/yolov5_research/train.py", line 740, in <module>
    main(opt)
  File "/mnt/data/yolov5_research/train.py", line 637, in main
    train(opt.hyp, opt, device, callbacks)
  File "/mnt/data/yolov5_research/train.py", line 421, in train
    loss, loss_items = compute_loss_ota(pred, targets.to(device), imgs) if aux_ota_loss else compute_loss(pred, targets.to(device)) # loss scaled by batch_siz
  File "/mnt/data/yolov5_research/utils/loss.py", line 1002, in __call__
    tobj[b, a, gj, gi] = (1.0 - self.gr) + self.gr * iou.detach().clamp(0).type(tobj.dtype)  # iou ratio
RuntimeError: shape mismatch: value tensor of shape [290, 1] cannot be broadcast to indexing result of shape [290]
positive666 commented 2 years ago

loss.py里没有import torch.nn.functional as F.然后这个修完了之后还有问题(数据集是coco128):

Traceback (most recent call last):
  File "/mnt/data/yolov5_research/train.py", line 740, in <module>
    main(opt)
  File "/mnt/data/yolov5_research/train.py", line 637, in main
    train(opt.hyp, opt, device, callbacks)
  File "/mnt/data/yolov5_research/train.py", line 421, in train
    loss, loss_items = compute_loss_ota(pred, targets.to(device), imgs) if aux_ota_loss else compute_loss(pred, targets.to(device)) # loss scaled by batch_siz
  File "/mnt/data/yolov5_research/utils/loss.py", line 1002, in __call__
    tobj[b, a, gj, gi] = (1.0 - self.gr) + self.gr * iou.detach().clamp(0).type(tobj.dtype)  # iou ratio
RuntimeError: shape mismatch: value tensor of shape [290, 1] cannot be broadcast to indexing result of shape [290]

有点仓促这是由于IOU的张量维度问题 不过我完善了一下逻辑可以兼容v5 v7的训练,对于Ota-loss,需要加上ota-match,对于aux-ota,需要加上 aux_ota_loss,嗯 感谢你的反馈 我会尽快修复 另外i这段代码导致P6辅助头训练也是存在问题的 作者刚刚修改也是,计算修复还有个代码的 bug明天看看

positive666 commented 2 years ago

我想应该改好了 是新的v5一些写法导致的 和旧版v5基础上集成的v7 有冲突 我测试了可以了应该 现在基本完美兼容了v7 并且v5的代码优化更好 所以代码风格还是以实时版本v5为 准 感谢你的反馈 我也后面会训练验证

IncubatorShokuhou commented 2 years ago

@positive666 谢谢大佬。能跑了。但是还有一些问题:

  1. 直接用原版的yolov7-e6e.yaml的话,会出现:
    RuntimeError: Given groups=1, weight of size [80, 3, 3, 3], expected input[1, 12, 128, 128] to have 3 channels, but got 12 channels instead
  2. 提示AMP无法使用。我已确定apex安装正常(from apex import amp)
  3. 训练时出现:WARNING: NMS time limit 1.060s exceeded =================分割线==================== 问题2的amp问题:更换为最新版本的yolov5的check_amp就ok了
positive666 commented 2 years ago

@positive666 谢谢大佬。能跑了。但是还有一些问题:

  1. 直接用原版的yolov7-e6e.yaml的话,会出现:
RuntimeError: Given groups=1, weight of size [80, 3, 3, 3], expected input[1, 12, 128, 128] to have 3 channels, but got 12 channels instead
  1. 提示AMP无法使用。我已确定apex安装正常(from apex import amp)
  2. 训练时出现:WARNING: NMS time limit 1.060s exceeded =================分割线==================== 问题2的amp问题:更换为最新版本的yolov5的check_amp就ok了 q1.你是训练嘛,能否提供下完整的命令 q2.已更新 q3.这个是时间限制: image
IncubatorShokuhou commented 2 years ago

@positive666 谢谢大佬。能跑了。但是还有一些问题:

  1. 直接用原版的yolov7-e6e.yaml的话,会出现:
RuntimeError: Given groups=1, weight of size [80, 3, 3, 3], expected input[1, 12, 128, 128] to have 3 channels, but got 12 channels instead
  1. 提示AMP无法使用。我已确定apex安装正常(from apex import amp)
  2. 训练时出现:WARNING: NMS time limit 1.060s exceeded =================分割线==================== 问题2的amp问题:更换为最新版本的yolov5的check_amp就ok了 q1.你是训练嘛,能否提供下完整的命令 q2.已更新 q3.这个是时间限制: image
python train.py  --cfg  models/v7_cfg/training/yolov7e6e_原版.yaml --imgsz 640 --weights 'yolov7_training_weights/yolov7-e6e_training.pt'  --data data/我自己的数据集.yaml  --aux_ota_loss  --hyp data/hyps/hyp.scratch-v7-p6.yaml --device 0 --batch-size 16 --epoch 1000 --multi-scale --cos-lr
positive666 commented 2 years ago

1.你这个错误我还没复现,用的V7的原始yaml,但是这个项目里我删除了Reorg 2.多尺度训练现在有BUG

IncubatorShokuhou commented 2 years ago

Reorg

我把Reorg重新加进去common.py里了。然后就报了这个错。不加的话直接就没法运行,提示缺少Reorg。

positive666 commented 2 years ago

Reorg

我把Reorg重新加进去common.py里了。然后就报了这个错。不加的话直接就没法运行,提示缺少Reorg。 yolo.py elif m is ReOrg: c2 = ch[f] * 4

positive666 commented 2 years ago

在yolo.py中加上 输出通道数的设置就可以了

github-actions[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Maple1024 commented 1 year ago

loss.py里没有import torch.nn.functional as F.然后这个修完了之后还有问题(数据集是coco128):

Traceback (most recent call last):
  File "/mnt/data/yolov5_research/train.py", line 740, in <module>
    main(opt)
  File "/mnt/data/yolov5_research/train.py", line 637, in main
    train(opt.hyp, opt, device, callbacks)
  File "/mnt/data/yolov5_research/train.py", line 421, in train
    loss, loss_items = compute_loss_ota(pred, targets.to(device), imgs) if aux_ota_loss else compute_loss(pred, targets.to(device)) # loss scaled by batch_siz
  File "/mnt/data/yolov5_research/utils/loss.py", line 1002, in __call__
    tobj[b, a, gj, gi] = (1.0 - self.gr) + self.gr * iou.detach().clamp(0).type(tobj.dtype)  # iou ratio
RuntimeError: shape mismatch: value tensor of shape [290, 1] cannot be broadcast to indexing result of shape [290]

这个问题是修改了代码哪个地方啊