Closed SpaceView closed 2 years ago
π Hello @SpaceView, thank you for your interest in YOLOv5 π! Please visit our βοΈ Tutorials to get started, where you can find quickstart guides for simple tasks like Custom Data Training all the way to advanced concepts like Hyperparameter Evolution.
If this is a π Bug Report, please provide screenshots and minimum viable code to reproduce your issue, otherwise we can not help you.
If this is a custom training β Question, please provide as much information as possible, including dataset images, training logs, screenshots, and a public link to online W&B logging if available.
For business inquiries or professional support requests please visit https://ultralytics.com or email Glenn Jocher at glenn.jocher@ultralytics.com.
Python>=3.6.0 with all requirements.txt installed including PyTorch>=1.7. To get started:
$ git clone https://github.com/ultralytics/yolov5
$ cd yolov5
$ pip install -r requirements.txt
YOLOv5 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):
If this badge is green, all YOLOv5 GitHub Actions Continuous Integration (CI) tests are currently passing. CI tests verify correct operation of YOLOv5 training (train.py), validation (val.py), inference (detect.py) and export (export.py) on MacOS, Windows, and Ubuntu every 24 hours and on every commit.
@SpaceView thanks for the bug report. This might just be due to out of date code or models. I tested this locally in PyCharm MacOS with python 3.9 and everything seems fine:
The CI tests regularly run YOLOv5n with all main functions (train, val, detect, export) on Windows also and they are green currently: https://github.com/ultralytics/yolov5/runs/3937706191?check_suite_focus=true
@fcakyon @SpaceView I'm not able to reproduce any error here. The following two examples execute correctly in Colab.
!python train.py --img 640 --batch 16 --epochs 3 --data coco128.yaml --weights yolov5n.pt
!python detect.py --weights runs/train/exp/weights/best.pt
!python train.py --img 640 --batch 16 --epochs 3 --data coco128.yaml --weights '' --cfg yolov5n.yaml
!python detect.py --weights runs/train/exp2/weights/best.pt
Response from detect.py calls is:
detect: weights=['runs/train/exp/weights/best.pt'], source=data/images, imgsz=[640, 640], conf_thres=0.25, iou_thres=0.45, max_det=1000, device=, view_img=False, save_txt=False, save_conf=False, save_crop=False, nosave=False, classes=None, agnostic_nms=False, augment=False, visualize=False, update=False, project=runs/detect, name=exp, exist_ok=False, line_thickness=3, hide_labels=False, hide_conf=False, half=False, dnn=False
YOLOv5 π v6.0-23-ga18b0c3 torch 1.9.0+cu111 CUDA:0 (Tesla P100-PCIE-16GB, 16280.875MB)
Fusing layers...
Model Summary: 213 layers, 1867405 parameters, 0 gradients, 4.5 GFLOPs
image 1/2 /content/yolov5/data/images/bus.jpg: 640x480 4 persons, 1 bus, 1 skateboard, Done. (0.015s)
image 2/2 /content/yolov5/data/images/zidane.jpg: 384x640 2 persons, 1 tie, Done. (0.016s)
Speed: 0.4ms pre-process, 15.3ms inference, 1.6ms NMS per image at shape (1, 3, 640, 640)
Results saved to runs/detect/exp
detect: weights=['runs/train/exp2/weights/best.pt'], source=data/images, imgsz=[640, 640], conf_thres=0.25, iou_thres=0.45, max_det=1000, device=, view_img=False, save_txt=False, save_conf=False, save_crop=False, nosave=False, classes=None, agnostic_nms=False, augment=False, visualize=False, update=False, project=runs/detect, name=exp, exist_ok=False, line_thickness=3, hide_labels=False, hide_conf=False, half=False, dnn=False
YOLOv5 π v6.0-23-ga18b0c3 torch 1.9.0+cu111 CUDA:0 (Tesla P100-PCIE-16GB, 16280.875MB)
Fusing layers...
Model Summary: 213 layers, 1867405 parameters, 0 gradients, 4.5 GFLOPs
image 1/2 /content/yolov5/data/images/bus.jpg: 640x480 Done. (0.016s)
image 2/2 /content/yolov5/data/images/zidane.jpg: 384x640 Done. (0.017s)
Speed: 0.4ms pre-process, 16.4ms inference, 0.4ms NMS per image at shape (1, 3, 640, 640)
Results saved to runs/detect/exp2
We've created a few short guidelines below to help users provide what we need in order to get started investigating a possible problem.
When asking a question, people will be better able to provide help if you provide code that they can easily understand and use to reproduce the problem. This is referred to by community members as creating a minimum reproducible example. Your code that reproduces the problem should be:
In addition to the above requirements, for Ultralytics to provide assistance your code should be:
git pull
or git clone
a new copy to ensure your problem has not already been resolved by previous commits.If you believe your problem meets all of the above criteria, please close this issue and raise a new one using the π Bug Report template and providing a minimum reproducible example to help us better understand and diagnose your problem.
Thank you! π
I also trained a new model from a custom trained model (exp2/weights/best.pt), and detecting again with the new exp3/weights/best.pt, everything worked correctly:
!python train.py --img 640 --batch 16 --epochs 3 --data coco128.yaml --weights runs/train/exp2/weights/best.pt
!python detect.py --weights runs/train/exp3/weights/best.pt
Hi @SpaceView @fcakyon, yes the bug originates from my PR. I have tried to reproduce the error with pre-trained and custom-trained yolov5n from scratch (similar code as @glenn-jocher), but detect.py works correctly with both models.
self.anchor_grid is supposed to be a list of Tensors, but from the error message, it looks like self.anchor_grid is a Tensor (it was a Tensor before my PR was merged) and assigning a Tensor of different shape is raising this error.
You can check this by adding print(type(self.anchor_grid))
in forward() of Detect module.
This conversion of Tensor to list of Tensors is done in attempt_load()
and I see that this function is being called during runtime
File "D:\vsAI\yolov5-6.0\models\experimental.py", line 96, in attempt_load
Compatibility with models trained before my PR was checked before merging it, so it's quite strange to see this bug. As suggested by Glenn, some more reproducer code/models are needed.
@glenn-jocher @SamFC10 the error is raised when a model trained on 5.0 source is used with detect.py from 6.0 source. Compatibility addition seems to be not working for some reason.
@fcakyon Please add a link to your trained model if possible. Some edge case is being missed.
I cannot add it for privacy reasons, will try to train a redundant model for reproducability.
@glenn-jocher @fcakyon @SamFC10 Great thanks for your attention, I use "vscode" in windows 10. If you don't use it, it may pass the thop.profile without any warning, so you need to add some additional info to reproduce this bug, as below:
def model_info(model, verbose=False, img_size=640):
# Model information. img_size may be int or list, i.e. img_size=640 or img_size=[640, 320]
n_p = sum(x.numel() for x in model.parameters()) # number parameters
n_g = sum(x.numel() for x in model.parameters() if x.requires_grad) # number gradients
if verbose:
print('%5s %40s %9s %12s %20s %10s %10s' % ('layer', 'name', 'gradient', 'parameters', 'shape', 'mu', 'sigma'))
for i, (name, p) in enumerate(model.named_parameters()):
name = name.replace('module_list.', '')
print('%5g %40s %9s %12g %20s %10.3g %10.3g' %
(i, name, p.requires_grad, p.numel(), list(p.shape), p.mean(), p.std()))
try: # FLOPs
from thop import profile
stride = max(int(model.stride.max()), 32) if hasattr(model, 'stride') else 32
img = torch.zeros((1, model.yaml.get('ch', 3), stride, stride), device=next(model.parameters()).device) # input
print('Now it is time to show the bug, -------------------> for debug purpose \n') #
flops = profile(deepcopy(model), inputs=(img,), verbose=False)[0] / 1E9 * 2 # stride GFLOPs
print('Can we print this out correctly?--- if NOT, here it is a problem, -------------------> for debug purpose\n')
img_size = img_size if isinstance(img_size, list) else [img_size, img_size] # expand if int/float
fs = ', %.1f GFLOPs' % (flops * img_size[0] / stride * img_size[1] / stride) # 640x640 GFLOPs
except (ImportError, Exception):
fs = ''
LOGGER.info(f"Model Summary: {len(list(model.modules()))} layers, {n_p} parameters, {n_g} gradients{fs}")
As you can see, in the model_info, I add 2 "print"s for debug. If this thop.profile works correctly, the 2 lines should print out correctly.
My output log is given as follows, you can see that only the first debug line is shown, while the second line is not, which means the thop.profile is by-passed by internal error break from python, consequently causing the coming lines un-excuted.
(torch) PS D:\vsAI\yolov5-6.0> d:; cd 'd:\vsAI\yolov5-6.0'; & 'D:\Anaconda3\envs\torch\python.exe' 'c:\Users\Administrator\.vscode\extensions\ms-python.python-2021.10.1336267007\pythonFiles\lib\python\debugpy\launcher' '58690' '--' 'd:\vsAI\yolov5-6.0\detect.py'
detect: weights=weights\yolov5n.pt, source=data\images\bus.jpg, imgsz=[640, 640], conf_thres=0.25, iou_thres=0.45, max_det=1000, device=, view_img=False, save_txt=False, save_conf=False, save_crop=False, nosave=False, classes=None, agnostic_nms=False, augment=False, visualize=False, update=False, project=runs\detect, name=exp, exist_ok=False, line_thickness=3, hide_labels=False, hide_conf=False, half=False, dnn=False
YOLOv5 2021-10-20 torch 1.9.1 CUDA:0 (GeForce GTX 1080 Ti, 11264.0MB)
Fusing layers...
Now it is time to show the bug, -------------------> for debug purpose
Model Summary: 213 layers, 1867405 parameters, 0 gradients
attemp_load_done
image 1/1 D:\vsAI\yolov5-6.0\data\images\bus.jpg: debug
640x480 4 persons, 1 bus, 1 skateboard, Done. (0.026s)
Speed: 2.0ms pre-process, 26.0ms inference, 9.0ms NMS per image at shape (1, 3, 640, 640)
It is easy, you can check it as I did.
I will look further into this problem in the next couple of days if I have time, from training to evaluation. I suppose it is caused by some mismatch of the anchor_grid setting somewhere, and it seems the thop.profile can accept tensor.expansion, but not direct tensor replacement. NO idea why this happens in PYTHON. It seems the issue #4833 has caused this problem.
By the way, I use the yolov5-6.0 model and 5.0 model from your release archive, they give the same results.
I may have find out the reason, the error has something to do with Python's intrinsic tensor expansion mechanism (dimension matching), @fcakyon is right,
@glenn-jocher @SamFC10 the error is raised when a model trained on 5.0 source is used with detect.py from 6.0 source. Compatibility addition seems to be not working for some reason.
I use the latest code and had a short training, the error disappeared when using the my trained results. If I use the downloaded model (e.g. Yolov5n.pt), the error pops up.
the error is raised when a model trained on 5.0 source is used
@SpaceView As I've mentioned above, please add links to your trained model if possible, so that the error can be reproduced from my side and debugged.
I meet this problem when I try the simple example in https://docs.ultralytics.com/tutorials/pytorch-hub/. I use the 6.0 yolov5s.pt
@RaZzzyz Cannot reproduce the bug using the simple example
mentioned in the link. I'm using Google Colab with the latest branch and model.
the error is raised when a model trained on 5.0 source is used
@SpaceView As I've mentioned above, please add links to your trained model if possible, so that the error can be reproduced from my side and debugged.
@SamFC10 Please read my answer slowly, I have supply all the infor you need, the model is from ultralytics, e.g.
https://github.com/ultralytics/yolov5/releases/download/v6.0/yolov5s.pt
https://github.com/ultralytics/yolov5/releases/download/v6.0/yolov5n.pt
To reproduce the issue please read my 2nd previous answer, surely you cannot print those 2 lines at the same time if you use old trained model, though no exception is raised.
I suppose this issue can be closed. If you train the model using the latest code, there will be no problem.
Hi all,
I am getting the same error. All details that @SpaceView and @SamFC10 mentioned are almost the same for me. I did not train my own model. I'm just trying to run the existing model. And torch.load row (self.grid[i], self.anchor_grid[i] = self._make_grid(nx, ny, i)) throws an error like "RuntimeError: The expanded size of the tensor (1) must match the existing size (80) at non-singleton dimension 3. Target sizes: [1, 3, 1, 1, 2]. Tensor sizes: [3, 48, 80, 2]".
By the way, I tried both 5.0 and 6.0 pretrained models.
@yamand16 π hi, thanks for letting us know about this possible problem with YOLOv5 π. We've created a few short guidelines below to help users provide what we need in order to get started investigating a possible problem.
When asking a question, people will be better able to provide help if you provide code that they can easily understand and use to reproduce the problem. This is referred to by community members as creating a minimum reproducible example. Your code that reproduces the problem should be:
For Ultralytics to provide assistance your code should also be:
git pull
or git clone
a new copy to ensure your problem has not already been solved in master.If you believe your problem meets all the above criteria, please close this issue and raise a new one using the π Bug Report template with a minimum reproducible example to help us better understand and diagnose your problem.
Thank you! π
π Hello, this issue has been automatically marked as stale because it has not had recent activity. Please note it will be closed if no further activity occurs.
Access additional YOLOv5 π resources:
Access additional Ultralytics β‘ resources:
Feel free to inform us of any other issues you discover or feature requests that come to mind in the future. Pull Requests (PRs) are also always welcomed!
Thank you for your contributions to YOLOv5 π and Vision AI β!
I wanted to chime in here that I as well ran into this issue. I wanted to wait until we updated to the most recent set of code hoping it would be resolved but unfortunately not.
We've had to temporary patch this call:
if self.onnx_dynamic or self.grid[i].shape[2:4] != x[i].shape[2:4]:
self.grid[i], self.anchor_grid[i] = self._make_grid(nx, ny, i)
to
if self.onnx_dynamic or self.grid[i].shape[2:4] != x[i].shape[2:4]:
self.grid[i] = self._make_grid(nx, ny).to(x[i].device)
and
if self.inplace:
y[..., 0:2] = (y[..., 0:2] * 2 - 0.5 + self.grid[i]) * self.stride[i] # xy
y[..., 2:4] = (y[..., 2:4] * 2) ** 2 * self.anchor_grid[i] # wh
else: # for YOLOv5 on AWS Inferentia https://github.com/ultralytics/yolov5/pull/2953
xy = (y[..., 0:2] * 2 - 0.5 + self.grid[i]) * self.stride[i] # xy
wh = (y[..., 2:4] * 2) ** 2 * self.anchor_grid[i] # wh
to
if self.inplace:
y[..., 0:2] = (y[..., 0:2] * 2 - 0.5 + self.grid[i]) * self.stride[i] # xy
y[..., 2:4] = (y[..., 2:4] * 2) ** 2 * self.anchor_grid[i] # wh
else: # for YOLOv5 on AWS Inferentia https://github.com/ultralytics/yolov5/pull/2953
xy = (y[..., 0:2] * 2 - 0.5 + self.grid[i]) * self.stride[i] # xy
wh = (y[..., 2:4] * 2) ** 2 * self.anchor_grid[i].view(1, self.na, 1, 1, 2)
and then revert the _make_grid
function back to:
@staticmethod
def _make_grid(nx=20, ny=20):
yv, xv = torch.meshgrid([torch.arange(ny), torch.arange(nx)])
return torch.stack((xv, yv), 2).view((1, 1, ny, nx, 2)).float()
And everything works as expected. If not, we get the same error that has been listed before.
@atremblay-rayhawk hi, thanks you for your fix suggestion on how to improve YOLOv5 π!
The fastest and easiest way to incorporate your ideas into the official codebase is to submit a Pull Request (PR) implementing your idea, and if applicable providing before and after profiling/inference/training results to help us understand the improvement your feature provides. This allows us to directly see the changes in the code and to understand how they affect workflows and performance.
Please see our β Contributing Guide to get started.
This should be because it is not supported now.
model = torch.load('./weights/yolov5s.pt', map_location=device)['model'].float() # load to FP32
model= DetectMultiBackend('./weights/yolov5s.pt', device=device, dnn=False) #this is OK !
But I prefer the first one. I don't want to do so complex encapsulation
@gg22mm YOLOv5 models can be loaded any way you want. Your problem is not reproducible:
When asking a question, people will be better able to provide help if you provide code that they can easily understand and use to reproduce the problem. This is referred to by community members as creating a minimum reproducible example. Your code that reproduces the problem should be:
For Ultralytics to provide assistance your code should also be:
git pull
or git clone
a new copy to ensure your problem has not already been solved in master.If you believe your problem meets all the above criteria, please close this issue and raise a new one using the π Bug Report template with a minimum reproducible example to help us better understand and diagnose your problem.
Thank you! π
I am getting the same error.
models\yolo.py line 59 self.grid[i], self.anchor_grid[i] = self._make_grid(nx, ny, i) RuntimeError: The expanded size of the tensor (1) must match the existing size (80) at non-singleton dimension 3. Target sizes: [1, 3, 1, 1, 2]. Tensor sizes: [3, 48, 80, 2]
@deepxiaobai π hi, thanks for letting us know about this possible problem with YOLOv5 π. We've created a few short guidelines below to help users provide what we need in order to get started investigating a possible problem.
When asking a question, people will be better able to provide help if you provide code that they can easily understand and use to reproduce the problem. This is referred to by community members as creating a minimum reproducible example. Your code that reproduces the problem should be:
For Ultralytics to provide assistance your code should also be:
git pull
or git clone
a new copy to ensure your problem has not already been solved in master.If you believe your problem meets all the above criteria, please close this issue and raise a new one using the π Bug Report template with a minimum reproducible example to help us better understand and diagnose your problem.
Thank you! π
i cannot help with code or analysis, but here is a model wich gives me the same error in the doods2 environment.
maybe someone needs such a model for further testing?
https://github.com/OlafenwaMoses/DeepStack_OpenLogo/releases/download/v1/openlogo.pt
@ozett π hi, thanks for letting us know about this possible problem with YOLOv5 π. We've created a few short guidelines below to help users provide what we need in order to get started investigating a possible problem.
When asking a question, people will be better able to provide help if you provide code that they can easily understand and use to reproduce the problem. This is referred to by community members as creating a minimum reproducible example. Your code that reproduces the problem should be:
For Ultralytics to provide assistance your code should also be:
git pull
or git clone
a new copy to ensure your problem has not already been solved in master.If you believe your problem meets all the above criteria, please close this issue and raise a new one using the π Bug Report template with a minimum reproducible example to help us better understand and diagnose your problem.
Thank you! π
I have trained a model with v5.0, saved the model and trying to load with v6.1. I am getting following error :
File "/workercode/./yolov5/models/common.py", line 439, in forward y = self.model(im, augment=augment, visualize=visualize)[0] File "/usr/local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, *kwargs) File "/workercode/./yolov5/models/yolo.py", line 137, in forward return self._forward_once(x, profile, visualize) # single-scale inference, train File "/workercode/./yolov5/models/yolo.py", line 160, in _forward_once x = m(x) # run File "/usr/local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(input, **kwargs) File "/workercode/./yolov5/models/yolo.py", line 65, in forward self.grid[i], self.anchor_grid[i] = self._make_grid(nx, ny, i) RuntimeError: The expanded size of the tensor (1) must match the existing size (20) at non-singleton dimension 3. Target sizes: [1, 3, 1, 1, 2]. Tensor sizes: [3, 20, 20, 2]
Is there any sugegssion that can help me???
Train a new model with the latest code.
Yeah i got the same error. However corrected it Here are the steps to correct --> 1.) make sure u cloned master branch 2.) take models weights from latest yolo5 , Never put previous yolo versions weights(.pt file) to latest , it gives non-singleton dimension 3 error . This is how i corrected my error All the Best
@JAYANTH-MOHAN thanks for sharing your solution! This will be helpful for others who encounter similar issues. If you have any other questions or need further assistance, feel free to ask. Good luck with your YOLOv5 project!
This is a bug specific to Yolov5-6.0; Yolov5-5.0 doesn't have this problem. How to Reproduce the bug,
The error info is given as below
it seems that the following item has some problem,
I use the following equivalent code to debug it
and found that when i==0: self.anchor_grid[0].shape -- >torch.Size([1, 3, 1, 1, 2]) tmp_anchor_grid.shape -- > torch.Size([1, 3, 4, 4, 2])
The problem seems coming from the thop.profile,
Currently I have no idea how these come out to be so, where is the self.anchor_grid[0] coming from?
When I run the script in windows powershell command console, I got no such a bug a, as below,