ultralytics / yolov3

YOLOv3 in PyTorch > ONNX > CoreML > TFLite
https://docs.ultralytics.com
GNU Affero General Public License v3.0
10.18k stars 3.44k forks source link

Export to onnx opencv forward assertion (-215:Assertion failed) start <= (int)shape.size() && end <= (int)shape.size() && start <= end in function 'total' #1798

Closed dov84d closed 3 years ago

dov84d commented 3 years ago

🐛 Bug

Hi, I am not sure that this is a bug, but I have reasons to believe that its indeed the case I am trying to export a trained model to onnx and I am getting the following error while trying to run the forward function:

(-215:Assertion failed) start <= (int)shape.size() && end <= (int)shape.size() && start <= end in function 'total'

To Reproduce (REQUIRED)

I trained the model using train.py with the following --data my_cfg.yaml --cfg yolov3-tiny.yaml --weights yolov3-tiny.pt --img 416 --batch-size 32 --epochs 150 Then I export the model using export.py with the following --weights best.pt --img-size 416 416 --device cpu --include onnx --opset-version 12

Output:

cv2.error: OpenCV(4.5.2) /tmp/pip-req-build-947ayiyu/opencv/modules/dnn/include/opencv2/dnn/shape_utils.hpp:170: error: (-215:Assertion failed) start <= (int)shape.size() && end <= (int)shape.size() && start <= end in function 'total'

img = cv2.imread(img_path, cv2.IMREAD_COLOR) blob = cv2.dnn.blobFromImage(img, 1.0/255, (416, 416), (0, 0, 0), swapRB=True, crop=False) print(blob.shape) #(1, 3, 416, 416) model_file = 'best.onnx' net = cv2.dnn.readNetFromONNX(model_file) net.setInput(blob) pred = net.forward()

Expected behavior

Succeed forward

Environment

If applicable, add screenshots to help explain your problem.

github-actions[bot] commented 3 years ago

👋 Hello @dov84d, thank you for your interest in YOLOv3 🚀! Please visit our ⭐️ Tutorials to get started, where you can find quickstart guides for simple tasks like Custom Data Training all the way to advanced concepts like Hyperparameter Evolution.

If this is a 🐛 Bug Report, please provide screenshots and minimum viable code to reproduce your issue, otherwise we can not help you.

If this is a custom training ❓ Question, please provide as much information as possible, including dataset images, training logs, screenshots, and a public link to online W&B logging if available.

For business inquiries or professional support requests please visit https://www.ultralytics.com or email Glenn Jocher at glenn.jocher@ultralytics.com.

Requirements

Python 3.8 or later with all requirements.txt dependencies installed, including torch>=1.7. To install run:

$ pip install -r requirements.txt

Environments

YOLOv3 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):

Status

CI CPU testing

If this badge is green, all YOLOv3 GitHub Actions Continuous Integration (CI) tests are currently passing. CI tests verify correct operation of YOLOv3 training (train.py), testing (test.py), inference (detect.py) and export (export.py) on MacOS, Windows, and Ubuntu every 24 hours and on every commit.

glenn-jocher commented 3 years ago

@dov84d since your error message originates in cv2 you should probably raise the issue directly on the cv2 repository.

dov84d commented 3 years ago

Hi, The bug is indeed in ultralytics repo.

It's seems that onnx opset version 12 doesn't support the specific broadcasting in the forward function in yolo.py

So I change the code to use explicit expand instead.

`

def forward(self, x):

    # x = x.copy()  # for profiling
    z = []  # inference output
    for i in range(self.nl):
        x[i] = self.m[i](x[i])  # conv
        bs, _, ny, nx = x[i].shape  # x(bs,255,20,20) to x(bs,3,20,20,85)
        x[i] = x[i].view(bs, self.na, self.no, ny, nx).permute(0, 1, 3, 4, 2).contiguous()

        if not self.training:  # inference
            y = x[i].sigmoid()
            if self.grid[i].shape[2:4] != x[i].shape[2:4] or self.onnx_dynamic: #not torch.eq(self.grid[i].shape[2:4], x[i].shape[2:4])
                self.grid[i] = self._make_grid(nx, ny).to(x[i].device).expand(-1,y.shape[1],-1,-1,-1)

            if self.inplace:
                y[..., 0:2] = (y[..., 0:2] * 2. - 0.5 + self.grid[i]) * self.stride[i]  # xy
                y[..., 2:4] = (y[..., 2:4] * 2) ** 2 * self.anchor_grid[i]  # wh
            else:  # for YOLOv5 on AWS Inferentia https://github.com/ultralytics/yolov5/pull/2953
                xy = (y[..., 0:2] * 2. - 0.5 + self.grid[i]) * self.stride[i]  # xy (torch.add(torch.mul(y[..., 0:2], 2.), - 0.5 + 5)) * self.stride[i]  # xy
                wh = (y[..., 2:4] * 2) ** 2 * self.anchor_grid[i].view(1, self.na, 1, 1, 2).expand(-1,y.shape[1],y.shape[2],y.shape[3],-1)  # wh
                y = torch.cat((xy, wh, y[..., 4:]), -1)
            z.append(y.view(bs, -1, self.no))

    return x if self.training else (torch.cat(z, 1), x)

`

github-actions[bot] commented 3 years ago

👋 Hello, this issue has been automatically marked as stale because it has not had recent activity. Please note it will be closed if no further activity occurs.

Access additional YOLOv3 🚀 resources:

Access additional Ultralytics ⚡ resources:

Feel free to inform us of any other issues you discover or feature requests that come to mind in the future. Pull Requests (PRs) are also always welcomed!

Thank you for your contributions to YOLOv3 🚀 and Vision AI ⭐!

glenn-jocher commented 11 months ago

@dov84d thanks for the update. The changes you made seem to address the issue effectively by replacing the specific broadcasting with explicit expand. Good job on pinpointing and resolving the problem! If you encounter any more issues or have further questions, please feel free to reach out.