ultralytics / yolov5

YOLOv5 🚀 in PyTorch > ONNX > CoreML > TFLite
https://docs.ultralytics.com
GNU Affero General Public License v3.0
49.67k stars 16.11k forks source link

Use export.py to generate yolov5s.onnx will get a negative number. #343

Closed cmdbug closed 4 years ago

cmdbug commented 4 years ago

❔Question

Use export.py to generate yolov5s.onnx will get a negative number. image image

This is the code that executes the onnx part

session = onnxruntime.InferenceSession('./weights/yolov5s.onnx')

batch_size = session.get_inputs()[0].shape[0]
img_size_h = session.get_inputs()[0].shape[2]
img_size_w = session.get_inputs()[0].shape[3]

image_src = Image.open(image_path)
resized = letterbox_image(image_src, (img_size_w, img_size_h))
img_in = np.transpose(resized, (2, 0, 1)).astype(np.float32)  # HWC -> CHW
img_in = np.expand_dims(img_in, axis=0)
img_in /= 255.0

input_name = session.get_inputs()[0].name
# output, output_exist = session.run(['decoder.output_conv', 'lane_exist.linear2'], {"input.1": image_np})
outputs = session.run(None, {input_name: img_in})

There are already negative numbers in the outputs. Then after the result is processed, it will appear that part of it is correct, such as car in the figure. However, the top/bottom of the bicycle is right, the left/right is wrong, the left/right of the dog is right, and the top/bottom is wrong. What might cause this problem? Thanks.

Additional context

torch:1.5.1 torchvision:0.6.1 onnxruntime:1.3.0

github-actions[bot] commented 4 years ago

Hello @WZTENG, thank you for your interest in our work! Please visit our Custom Training Tutorial to get started, and see our Jupyter Notebook Open In Colab, Docker Image, and Google Cloud Quickstart Guide for example environments.

If this is a bug report, please provide screenshots and minimum viable code to reproduce your issue, otherwise we can not help you.

If this is a custom model or data training question, please note that Ultralytics does not provide free personal support. As a leader in vision ML and AI, we do offer professional consulting, from simple expert advice up to delivery of fully customized, end-to-end production solutions for our clients, such as:

For more information please visit https://www.ultralytics.com.

hxk11111 commented 4 years ago

Have you solved the problem? I have the same question here

cmdbug commented 4 years ago

Have you solved the problem? I have the same question here

@dlawrences cool

dlawrences commented 4 years ago

Hi both,

It very much seems that the above script generates the results as per the three raw output layers:

These results are not final. In the detect.py script these are also processed during inference:

https://github.com/ultralytics/yolov5/blob/a1c8406af3eac3e20d4dd5d327fd6cbd4fbb9752/models/yolo.py#L29-L36

You have two options:

  1. Change export.py to include the Detect layer:

https://github.com/ultralytics/yolov5/blob/a1c8406af3eac3e20d4dd5d327fd6cbd4fbb9752/models/export.py#L28

Above needs to be changed to false.

I have done this for my own experiments. The ONNX export seems to work, however the CoreML one doesn't.

  1. Create logic to replicate inference steps in Detect layer

You could replicate the same logic that's referenced above using numpy (i.e. pass the results through sigmoid and do all the handling).

In both cases, you do miss the following:

These are currently done as part of the following function: https://github.com/ultralytics/yolov5/blob/a1c8406af3eac3e20d4dd5d327fd6cbd4fbb9752/utils/utils.py#L549-L554

Hope this is useful. Good luck!

cmdbug commented 4 years ago

thanks!

cmdbug commented 4 years ago

@dlawrences model.model[-1].export = False After modifying it to false, there is still a problem with generating .onnx parsing. Can you refer to the code for parsing onnx? Thank you!

dlawrences commented 4 years ago

@WZTENG what is the error you are encountering?

cmdbug commented 4 years ago

image model.model[-1].export = False After modifying it to false, the result is still problematic, and it feels worse than before.

dlawrences commented 4 years ago

I understand why you're saying it feels worse than before, but that it is because you are now missing any NMS step, as specified above. Could you please answer/check on the following points?

Also, I would like to have a look at the .onnx file, just to make sure there's nothing wrong with you. Would you please attach it here?

Thanks

cmdbug commented 4 years ago

Use the official yolov5s.pt file to convert it. Not trained by myself. model.model[-1].export = False ---> yolov5s.onnx The result of detect.py operation is correct. image before: anchors = [[10, 13, 16, 30, 33, 23], [30, 61, 62, 45, 59, 119], [116, 90, 156, 198, 373, 326]] # 5s image after: anchors = [[116, 90, 156, 198, 373, 326], [30, 61, 62, 45, 59, 119], [10, 13, 16, 30, 33, 23]] # 5s image image After modifying the anchors sequence, I found that some parts are normal, but there are still problems with other pictures.

cmdbug commented 4 years ago

Finally succeeded. It is estimated that some of the internal processing methods are different from what I thought before. If I have time, I will carefully check the internal processing process. The current successful way is to set model.model[-1].export = False, and use the output [0] and call the official NMS to display it correctly. Previously, I used the results with 3 outputs and processed the relevant content myself. However, thank you for providing useful information.

dlawrences commented 4 years ago

Hi @WZTENG

Great news, happy you have managed to do it! I think it would be really useful for others to create a mini-documentation containing your findings. Would you be willing to put together this info?

CC @glenn-jocher

Tip: It should however be possible to process the output of the three feature maps independently from the Detect layer by replicating all those operations in NumPy/any other framework though. I have managed to do it myself using CoreML ops.

Cheers

dlawrences commented 4 years ago

Additional info: I am not sure what you mean by "and use the output [0]", but if you are only consuming the results as per the higher level feature map (80x80), then you are missing on some results.

Please consider that the Detect layer, at least as per my memory, produces outputs scaled to the dimensions of the input image (i.e. the original image may be 1920x1080, but you have trained using 640x640 inputs, so this is the dimension space). In detect.py, there is some logic that handles this. Namely:

These operations are not done in the Detect layer, but as part of the post-processing (even after NMS).

cmdbug commented 4 years ago

It is only successful now, but I am still not sure what caused it. If it can be solved, I will send it out.

cmdbug commented 4 years ago

It feels a bit different from the data I have seen before, but the output is no problem. The following is the method I implemented, although there are more, but at least I understand how to deal with it. I hope that those who have encountered this problem can refer to it. This is the analysis method I wrote myself. Please note that the parameters of model.model[-1].export = BOOL have a big difference during export.

def detect_onnx(official=True, image_path=None):
    num_classes = 80
    anchors = [[116, 90, 156, 198, 373, 326], [30, 61, 62, 45, 59, 119], [10, 13, 16, 30, 33, 23]]  # 5s

    session = onnxruntime.InferenceSession('./weights/yolov5s.onnx')
    # print("The model expects input shape: ", session.get_inputs()[0].shape)
    batch_size = session.get_inputs()[0].shape[0]
    img_size_h = session.get_inputs()[0].shape[2]
    img_size_w = session.get_inputs()[0].shape[3]

    # input
    image_src = Image.open(image_path)
    resized = letterbox_image(image_src, (img_size_w, img_size_h))

    img_in = np.transpose(resized, (2, 0, 1)).astype(np.float32)  # HWC -> CHW
    img_in = np.expand_dims(img_in, axis=0)
    img_in /= 255.0
    # print("Shape of the image input shape: ", img_in.shape)

    # inference
    input_name = session.get_inputs()[0].name
    outputs = session.run(None, {input_name: img_in})

    batch_detections = []
    if official and len(outputs) == 4:   # model.model[-1].export = boolean ---> True:3 False:4
        # model.model[-1].export = False ---> outputs[0] (1, xxxx, 85)
        # official
        batch_detections = torch.from_numpy(np.array(outputs[0]))
        batch_detections = non_max_suppression(batch_detections, conf_thres=0.4, iou_thres=0.5, agnostic=False)
    else:
        # model.model[-1].export = False ---> outputs[1]/outputs[2]/outputs[2]
        # model.model[-1].export = True  ---> outputs
        # (1, 3, 20, 20, 85)
        # (1, 3, 40, 40, 85)
        # (1, 3, 80, 80, 85)
        # myself (from yolo.py Detect)
        boxs = []
        a = torch.tensor(anchors).float().view(3, -1, 2)
        anchor_grid = a.clone().view(3, 1, -1, 1, 1, 2)
        if len(outputs) == 4:
            outputs = [outputs[1], outputs[2], outputs[3]]
        for index, out in enumerate(outputs):
            out = torch.from_numpy(out)
            batch = out.shape[1]
            feature_w = out.shape[2]
            feature_h = out.shape[3]

            # Feature map corresponds to the original image zoom factor
            stride_w = int(img_size_w / feature_w)
            stride_h = int(img_size_h / feature_h)

            conf = out[..., 4]
            pred_cls = out[..., 5:]

            grid_x, grid_y = np.meshgrid(np.arange(feature_w), np.arange(feature_h))

            # cx, cy, w, h
            pred_boxes = torch.FloatTensor(out[..., :4].shape)
            pred_boxes[..., 0] = (torch.sigmoid(out[..., 0]) * 2.0 - 0.5 + grid_x) * stride_w  # cx
            pred_boxes[..., 1] = (torch.sigmoid(out[..., 1]) * 2.0 - 0.5 + grid_y) * stride_h  # cy
            pred_boxes[..., 2:4] = (torch.sigmoid(out[..., 2:4]) * 2) ** 2 * anchor_grid[index]  # wh

            conf = torch.sigmoid(conf)
            pred_cls = torch.sigmoid(pred_cls)

            output = torch.cat((pred_boxes.view(batch_size, -1, 4),
                                conf.view(batch_size, -1, 1),
                                pred_cls.view(batch_size, -1, num_classes)),
                               -1)
            boxs.append(output)

        outputx = torch.cat(boxs, 1)
        # NMS
        batch_detections = w_non_max_suppression(outputx, num_classes, conf_thres=0.4, nms_thres=0.3)

    return batch_detections

If necessary, you can change all the methods used to numpy to achieve better, which is convenient for other frameworks.

mukul1em commented 4 years ago

@WZTENG can you share the whole code for onnx inference? also what is w_non_max_suppression()

imyoungyang commented 4 years ago

@WZTENG thank you very much for your works and test. Could you share your code for the function w_non_max_suppression?

Thank you again.

cmdbug commented 4 years ago

@mukul1em @imyoungyang here. download demo_onnx.zip and unzip. demo_onnx.zip

cmdbug commented 4 years ago

yolov3/v4

image

yolov5

image

china56321 commented 4 years ago

@mukul1em @imyoungyang here. download demo_onnx.zip and unzip. demo_onnx.zip

Can it be used to test whether the converted best.onnx is OK?

cmdbug commented 4 years ago

@mukul1em @imyoungyang here. download demo_onnx.zip and unzip. demo_onnx.zip

Can it be used to test whether the converted best.onnx is OK?

Yes, but I only wrote the 5s part of the code, other models need to add the corresponding anchors part. Note the order of anchors.

china56321 commented 4 years ago

@mukul1em @imyoungyang here. download demo_onnx.zip and unzip. demo_onnx.zip

Can it be used to test whether the converted best.onnx is OK?

Yes, but I only wrote the 5s part of the code, other models need to add the corresponding anchors part. Note the order of anchors.

I use your code te test yolov5s.onnx ,but something eror happened: File "demo_onnx.py", line 306, in detections = detect_onnx(official=False, image_path=image_path) File "demo_onnx.py", line 234, in detect_onnx pred_boxes[..., 0] = (torch.sigmoid(out[..., 0]) 2.0 - 0.5 + grid_x) stride_w # cx TypeError: add(): argument 'other' (position 1) must be Tensor, not numpy.ndarray

cmdbug commented 4 years ago

@mukul1em @imyoungyang here. download demo_onnx.zip and unzip. demo_onnx.zip

Can it be used to test whether the converted best.onnx is OK?

Yes, but I only wrote the 5s part of the code, other models need to add the corresponding anchors part. Note the order of anchors.

I use your code te test yolov5s.onnx ,but something eror happened: File "demo_onnx.py", line 306, in detections = detect_onnx(official=False, image_path=image_path) File "demo_onnx.py", line 234, in detect_onnx pred_boxes[..., 0] = (torch.sigmoid(out[..., 0]) 2.0 - 0.5 + grid_x) stride_w # cx TypeError: add(): argument 'other' (position 1) must be Tensor, not numpy.ndarray

It works normally for me. Did you modify the code? You can also use the conversion function for conversion. numpy->tensor: torch.from_numpy(numpy array) tensor->numpy: tensor array .numpy() The code of this zip is yolov5_v1.x version, not yolov5_v2 version.

china56321 commented 4 years ago

@mukul1em @imyoungyang here. download demo_onnx.zip and unzip. demo_onnx.zip

Can it be used to test whether the converted best.onnx is OK?

Yes, but I only wrote the 5s part of the code, other models need to add the corresponding anchors part. Note the order of anchors.

I use your code te test yolov5s.onnx ,but something eror happened: File "demo_onnx.py", line 306, in detections = detect_onnx(official=False, image_path=image_path) File "demo_onnx.py", line 234, in detect_onnx pred_boxes[..., 0] = (torch.sigmoid(out[..., 0]) 2.0 - 0.5 + grid_x) stride_w # cx TypeError: add(): argument 'other' (position 1) must be Tensor, not numpy.ndarray

It works normally for me. Did you modify the code? You can also use the conversion function for conversion. numpy->tensor: torch.from_numpy(numpy array) tensor->numpy: tensor array .numpy() The code of this zip is yolov5_v1.x version, not yolov5_v2 version.

NO ,i modify nothing. Could you upload your yolov5s.onnx or other onnx so as to check whether the onnx that converted is not correct ?

EmilioOldenziel commented 4 years ago

@WZTENG For me it worked, I only got a Type-error for the draw rectangle input. When I casted the box coordinates to int it all worked using the latest commit.

china56321 commented 4 years ago

@mukul1em @imyoungyang here. download demo_onnx.zip and unzip. demo_onnx.zip

Can it be used to test whether the converted best.onnx is OK?

Yes, but I only wrote the 5s part of the code, other models need to add the corresponding anchors part. Note the order of anchors.

I use your code te test yolov5s.onnx ,but something eror happened: File "demo_onnx.py", line 306, in detections = detect_onnx(official=False, image_path=image_path) File "demo_onnx.py", line 234, in detect_onnx pred_boxes[..., 0] = (torch.sigmoid(out[..., 0]) 2.0 - 0.5 + grid_x) stride_w # cx TypeError: add(): argument 'other' (position 1) must be Tensor, not numpy.ndarray

It works normally for me. Did you modify the code? You can also use the conversion function for conversion. numpy->tensor: torch.from_numpy(numpy array) tensor->numpy: tensor array .numpy() The code of this zip is yolov5_v1.x version, not yolov5_v2 version.

NO ,i modify nothing. Could you upload your yolov5s.onnx or other onnx so as to check whether the onnx that converted is not correct ?

It's ok now.

Jacobsolawetz commented 4 years ago

@WZTENG thanks for the .zip! It works well for me

@china56321 @EmilioOldenziel should we modify this script so we don't have to import the heavy torch package? I'm working to that end and can update here if we are interested.

Jacobsolawetz commented 4 years ago

I did get that working so ONXX no torch is possible with no torch and @WZTENG's script!

BernardinD commented 4 years ago

@WZTENG Thx for the script and script. I'm new to Yolo so it really cleared up a lot of confusion. But are you able to get similar results to running directly with the .pt file? I'm not able to reproduce the same results from a .pt file with your script.

I tried my own fine-tuned weight as well as the origin checkpoints weight for V2

BernardinD commented 4 years ago

@WZTENG Thx for the script and script. I'm new to Yolo so it really cleared up a lot of confusion. But are you able to get similar results to running directly with the .pt file? I'm not able to reproduce the same results from a .pt file with your script.

I tried my own fine-tuned weight as well as the origin checkpoints weight for V2

After looking into it more I was able to figure out the issues I was having. For some reason unless I export to onnx with size 640 the box coordinates in your script aren't correct.

But the main issue is the NMS step. You can see in the image that I've attached below that the NMS isn't performing how you'd expect. And suggestions/advice?

image

cmdbug commented 4 years ago

download demo_onnx.zip and run demo_onnx.py. It can be run in v1 version, v2 version is not verified. It is important to note the order of anchorns. The order of v2 and v1 seems to have changed.

BernardinD commented 4 years ago

download demo_onnx.zip and run demo_onnx.py. It can be run in v1 version, v2 version is not verified. It is important to note the order of anchorns. The order of v2 and v1 seems to have changed.

Thx. The anchors was the issue

Ownmarc commented 4 years ago

@Jacobsolawetz , would you mind sharing your script that does nms on the onnx output without using torch please ? Did you use the output with the detect layer that give a shape (batch_size, XXXXX, number_of_classes+5) ?

torch.cat
torch.tensor
torchvision.ops.boxes.nms
torch.mm
Digital2Slave commented 4 years ago

I did get that working so ONXX no torch is possible with no torch and @WZTENG's script!

@Jacobsolawetz Could you share you modified demo_onnx.py ?

CangHaiQingYue commented 4 years ago

@mukul1em @imyoungyang here. download demo_onnx.zip and unzip. demo_onnx.zip

hello, do you have a cpp version for demo_onnx.py?

cmdbug commented 4 years ago

@mukul1em @imyoungyang here. download demo_onnx.zip and unzip. demo_onnx.zip

hello, do you have a cpp version for demo_onnx.py?

https://github.com/ultralytics/yolov5/issues/343#issuecomment-662317958 In fact, there is no difference from v4 and v3. You can directly find the previous cpp version and modify the formula and anchors.

ShreshthSaxena commented 4 years ago

Hi @WZTENG

Great news, happy you have managed to do it! I think it would be really useful for others to create a mini-documentation containing your findings. Would you be willing to put together this info?

CC @glenn-jocher

Tip: It should however be possible to process the output of the three feature maps independently from the Detect layer by replicating all those operations in NumPy/any other framework though. I have managed to do it myself using CoreML ops.

Cheers

@dlawrences can you please share more info on how to do it using CoreML ops ?

zzzz737 commented 4 years ago

download demo_onnx.zip and run demo_onnx.py. It can be run in v1 version, v2 version is not verified. It is important to note the order of anchorns. The order of v2 and v1 seems to have changed.

hello,thanks for your work! I have test by using yoru code, and check the anchors, they are the same as the yolov5! But my test results is not good by testing by the onnx model! The results are shown by the follow 2 images! I don't konw why, may the NMS? Can you give me some advice? Thanks very much! pt onnx

cmdbug commented 4 years ago

@zzzz737 The demo only applies to v1, v2 needs to modify the anchors order, v3 has not been used. For the time being, it is clear what caused this problem. Sorry.

zzzz737 commented 4 years ago

@zzzz737 The demo only applies to v1, v2 needs to modify the anchors order, v3 has not been used. For the time being, it is clear what caused this problem. Sorry.

Yeah, I changed the anchors order, the results are the same as the pt model! Thank you!

nobody-cheng commented 4 years ago

@zzzz737 The demo only applies to v1, v2 needs to modify the anchors order, v3 has not been used. For the time being, it is clear what caused this problem. Sorry.

Yeah, I changed the anchors order, the results are the same as the pt model! Thank you!

hi,can you share the whole code for onnx inference,Thank you

zzzz737 commented 4 years ago

@zzzz737 The demo only applies to v1, v2 needs to modify the anchors order, v3 has not been used. For the time being, it is clear what caused this problem. Sorry.

Yeah, I changed the anchors order, the results are the same as the pt model! Thank you!

hi,can you share the whole code for onnx inference,Thank you

the demo is given by the yolov5 projects in github by ultralytics/yolov5

china56321 commented 3 years ago

@mukul1em @imyoungyang here. download demo_onnx.zip and unzip. demo_onnx.zip

Does it only support square size (eg,640640) ? What should i do if i want to change the onnx input size(eg,640320 or other size ) ?

zzzz737 commented 3 years ago

@mukul1em @imyoungyang here. download demo_onnx.zip and unzip. demo_onnx.zip

Does it only support square size (eg,640_640) ? What should i do if i want to change the onnx input size(eg,640_320 or other size ) ?

320 is always ok!

china56321 commented 3 years ago

@mukul1em @imyoungyang here. download demo_onnx.zip and unzip. demo_onnx.zip

Does it only support square size (eg,640_640) ? What should i do if i want to change the onnx input size(eg,640_320 or other size ) ?

320 is always ok!

I mean 640 X 320 or other size,eg 460*860 ,not 640 X 640 ,What should i do to achieve this function ?

zzzz737 commented 3 years ago

@mukul1em @imyoungyang here. download demo_onnx.zip and unzip. demo_onnx.zip

Does it only support square size (eg,640_640) ? What should i do if i want to change the onnx input size(eg,640_320 or other size ) ?

320 is always ok!

I mean 640_320 ,not 640_640

you can try,i think it's ok

china56321 commented 3 years ago

@mukul1em @imyoungyang here. download demo_onnx.zip and unzip. demo_onnx.zip

Does it only support square size (eg,640_640) ? What should i do if i want to change the onnx input size(eg,640_320 or other size ) ?

320 is always ok!

I mean 640_320 ,not 640_640

you can try,i think it's ok

Yes ,i have tried it ,the box is wrong ,all are shifted, Can you sovle it ?

EvgeniiTitov commented 3 years ago

I did get that working so ONXX no torch is possible with no torch and @WZTENG's script!

Hi @Jacobsolawetz ,

By any chance are you willing to share your script with no torch dependency?

Regards, E

shayanalibhatti commented 3 years ago

Hi all, @WZTENG thanks for providing the code for onnx inference. I improvised the anchors as you suggested. The code works fine for my single class model, on a yolov5m.pt model converted to onnx with model.model[-1].export = True. The detection results of .pt and .onnx inference match perfectly. Model was based on yolov5 v3.

For yolov5x based weights file, I exported onnx with model.model[-1].export = False in export.py. Also, in the demo_onnx.py file, detections should be obtained with

detections = detect_onnx(official=True, image_path=image_path)

And now my .pt and .onnx file results match perfectly for yolov5x based architecture for yolov5 v3 model ... Thanks a lot

BernardinD commented 3 years ago

@mukul1em @imyoungyang here. download demo_onnx.zip and unzip. demo_onnx.zip

Does your inference script work for the v3 graph? I'm having issues with the final detection boxes. I'm using the same anchors as I did with the v2 graph. But changed the input size to 416

Thx in advance

UPDATE: My problem actually seems to be with the 416 input size. But I'm not sure why the post-processing would change because of it