cloudrivers commented 4 years ago

🐛 Bug

Hi, When I follow the advise to save a onnx model. But onnx.checker.check_model(model) faild.

To Reproduce

Steps to reproduce the behavior:

Set ONNX_EXPORT to True
have torch.onnx.export(model, img, 'weights/michael_export.onnx', verbose=True, opset_version=11)
run detect.py

Environment

pytorch : 1.3.1 onnx: 1.5.0 python: 3.6.6

Error info

Traceback (most recent call last): File "detect.py", line 178, in detect() File "detect.py", line 52, in detect onnx.checker.check_model(model) # Check that the IR is well formed File "/home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/onnx/checker.py", line 86, in check_model C.check_model(model.SerializeToString()) onnx.onnx_cpp2py_export.checker.ValidationError: Node () has input size 4 not in range [min=2, max=2].

==> Context: Bad node spec: input: "675" input: "698" input: "698" input: "697" output: "699" op_type: "Resize" attribute { name: "coordinate_transformation_mode" s: "asymmetric" type: STRING } attribute { name: "cubic_coeff_a" f: -0.75 type: FLOAT } attribute { name: "mode" s: "nearest" type: STRING } attribute { name: "nearest_mode" s: "floor" type: STRING }

glenn-jocher commented 4 years ago

@cloudrivers upgrading to onnx 1.6.0 should fix this.

cloudrivers commented 4 years ago

@glenn-jocher After pip install onnx=1.6.0, a new error occurred.

Segmentation fault (core dumped)

cloudrivers commented 4 years ago

I have upgrade pytorch to 1.4.0 and onnx to 1.6.0. it is still not work.

Namespace(agnostic_nms=False, cfg='cfg/yolov3.cfg', classes=None, conf_thres=0.3, device='', fourcc='mp4v', half=False, img_size=416, iou_thres=0.5, names='data/myclass.names', output='output', save_txt=False, source='data/samples', view_img=False, weights='weights/best.pt') Using CPU

Segmentation fault (core dumped)

cloudrivers commented 4 years ago

@glenn-jocher Hi, I use onnxruntime to print the input & output. I think there may something wrong when converting the onnx.

import onnxruntime import numpy as np import os

sess = onnxruntime.InferenceSession('./weights/best.onnx', None)

input_name = sess.get_inputs()[0].name print("Input name :", input_name) input_shape = sess.get_inputs()[0].shape print("Input shape :", input_shape) input_type = sess.get_inputs()[0].type print("Input type :", input_type)

output_name = sess.get_outputs()[0].name print("Output name :", output_name)
output_shape = sess.get_outputs()[0].shape print("Output shape :", output_shape) output_type = sess.get_outputs()[0].type print("Output type :", output_type)

output_name = sess.get_outputs()[1].name print("Output name :", output_name)
output_shape = sess.get_outputs()[1].shape print("Output shape :", output_shape) output_type = sess.get_outputs()[1].type print("Output type :", output_type)

x = np.random.random(input_shape) x = x.astype(np.float32)

result = sess.run([output_name], {input_name: x})

Input name : input.1 Input shape : [1, 3, 320, 192] Input type : tensor(float) Output name : 839 Output shape : [3780, 1] Output type : tensor(float) Output name : 842 Output shape : [3780, 4] Output type : tensor(float)

Fail Traceback (most recent call last)

in () 68 x = x.astype(np.float32) 69 ---> 70 result = sess.run([output_name], {input_name: x}) 71 72 #print(result) ~/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/onnxruntime/capi/session.py in run(self, output_names, input_feed, run_options) 140 output_names = [output.name for output in self._outputs_meta] 141 try: --> 142 return self._sess.run(output_names, input_feed, run_options) 143 except C.EPFail as err: 144 if self._enable_fallback: Fail: [ONNXRuntimeError] : 1 : FAIL : Non-zero status code returned while running Conv node. Name:'' Status Message: Input channels C is not equal to kernel channels * group. C: 513 kernel channels: 768 group: 1

glenn-jocher commented 4 years ago

@cloudrivers note that in a conda environment on MacOS and Linux (such as we use), we had to install onnx using https://github.com/onnx/onnx#linux-and-macos:

conda install -c conda-forge protobuf numpy
pip install onnx

rather than directly with conda:

conda install -c conda-forge onnx

cloudrivers commented 4 years ago

Hi @glenn-jocher , I have finished a transfer learning with yolov3.weights for 1 classes. The mAP is even near 0.98. Both of the train and inference is done very well. Yeah your repo is wonderful ! However, I feel the onnx export for 1 classes with yolov3.weigthts may not function. because the official convert sample works well on curret onnx==1.6.0. But our code will Segmentation fault (core dumped). Bellow is the minimal test case, and it works in my env.

Build a Mock Model in PyTorch with a convolution and a reduceMean layer

import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torchvision import datasets, transforms
from torch.autograd import Variable
import torch.onnx as torch_onnx
class Model(nn.Module):
    def __init__(self):
        super(Model, self).__init__()
        self.conv = nn.Conv2d(in_channels=3, out_channels=32, kernel_size=(3,3), stride=1, padding=0, bias=False)
    def forward(self, inputs):
        x = self.conv(inputs)
        #x = x.view(x.size()[0], x.size()[1], -1)
        return torch.mean(x, dim=2)
input_shape = (3, 100, 100)
model_onnx_path = "torch_model.onnx"
model = Model()
model.train(False)
dummy_input = Variable(torch.randn(1, *input_shape))
output = torch_onnx.export(model, 
                          dummy_input, 
                          model_onnx_path, 
                          verbose=False)
print("Export of torch_model.onnx complete!")

glenn-jocher commented 4 years ago

@cloudrivers for onnx bug issues, post your question on the onnx repo, for pytorch bug isses, post your question on the pytorch repo. This repo is not a catch-all to answer pytorch and onnx bugs.

cloudrivers commented 4 years ago

@glenn-jocher Hi, I think this is not a onnx or pytorch issue. It is convertion code has something wrong.

glenn-jocher commented 4 years ago

@cloudrivers but your code to reproduce here https://github.com/ultralytics/yolov3/issues/789#issuecomment-575956814 contains no code from this repo. So if this code is causing problems, you must raise it on the relevant repo, not this one.

Wenbin94 commented 4 years ago

@cloudrivers Have you solved your problem?

BrunoVox commented 4 years ago

@glenn-jocher After pip install onnx=1.6.0, a new error occurred.

Segmentation fault (core dumped)

So, after digging a bit into this issue, which affected me as well, I have found this https://github.com/onnx/onnx/issues/2394#issuecomment-581638840 solution. For me it worked. To summarize: apparently there's a dynamic loader issue, so you should import onnx in the first lines of your code, before a import torch is called. Just edit detect.py accordingly. Hope this helps.

hope-yao commented 4 years ago

@glenn-jocher After pip install onnx=1.6.0, a new error occurred. Segmentation fault (core dumped)

So, after digging a bit into this issue, which affected me as well, I have found this onnx/onnx#2394 (comment) solution. For me it worked. To summarize: apparently there's a dynamic loader issue, so you should import onnx in the first lines of your code, before a import torch is called. Just edit detect.py accordingly. Hope this helps.

@glenn-jocher After pip install onnx=1.6.0, a new error occurred. Segmentation fault (core dumped)

So, after digging a bit into this issue, which affected me as well, I have found this onnx/onnx#2394 (comment) solution. For me it worked. To summarize: apparently there's a dynamic loader issue, so you should import onnx in the first lines of your code, before a import torch is called. Just edit detect.py accordingly. Hope this helps.

This works on my end. Thanks BrunoVox, you saved my day.

glenn-jocher commented 11 months ago

@hope-yao glad to hear that the solution worked for you! If you have any more questions or run into any other issues, feel free to ask. The YOLO community and the Ultralytics team are always here to help.

ultralytics / yolov3

Node has input size 4 not in range [min=2, max=2] #789

🐛 Bug

To Reproduce

Environment

Error info

Build a Mock Model in PyTorch with a convolution and a reduceMean layer