microsoft / onnxruntime

ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator
https://onnxruntime.ai
MIT License
14.71k stars 2.93k forks source link

Unable to quantize `torchvision.detection` models #19544

Open sklum opened 9 months ago

sklum commented 9 months ago

Describe the issue

I'm trying to quantize a model from torchvision that I've exported to onnx as follows:

import torch
from torchvision.models.detection import (
    faster_rcnn,
)

model = faster_rcnn.fasterrcnn_resnet50_fpn_v2(
    pretrained=False, pretrained_backbone=False
)
model.eval()

inputs = torch.rand(1, 3, 640, 640)

torch.onnx.export(
    model,
    inputs,
    "torchvision.onnx",
)

This model works fine in onnxruntime as is, but I'd like try to quantize it.

First, I get Exception: Incomplete symbolic shape inference during the preprocessing step:

$ python -m onnxruntime.quantization.preprocess --input torchvision.onnx --output torchvision_preprocessed.onnx

But adding --skip_symbolic_shape true generates a model without error.

Then, I attempt to quantize as in the example:

...
    quantize_static(
        "./torchvision_preprocessed.onnx",
        "./torchvision_preprocessed_quantized.onnx",
        data_reader,
    )
...

And I get the following exception:

onnxruntime.capi.onnxruntime_pybind11_state.RuntimeException: [ONNXRuntimeError] : 6 : RUNTIME_EXCEPTION : Non-zero status code returned while running ReduceMax node. Name:'/Squeeze_2_output_0_ReduceMax' Status Message:

Notably, the status message actually is empty.

Running it multiple times produces exceptions at different layers each time:

onnxruntime.capi.onnxruntime_pybind11_state.RuntimeException: [ONNXRuntimeError] : 6 : RUNTIME_EXCEPTION : Non-zero status code returned while running ReduceMax node. Name:'/roi_heads/box_roi_pool/Gather_6_output_0_ReduceMax' Status Message:

But it's always a ReduceMax. The weird thing is, looking at the model with Netron I can't find these layers in the model at all. Is that expected?

Is there something in the nature of the RCNN variants used by torchvision that prevents quantization? I'm able to quantize other models successfully.

To reproduce

See above.

Urgency

No response

Platform

Mac

OS Version

14.2.1

ONNX Runtime Installation

Released Package

ONNX Runtime Version or Commit ID

1.17.0

ONNX Runtime API

Python

Architecture

ARM64

Execution Provider

Default CPU

Execution Provider Library Version

No response

skottmckay commented 9 months ago

@yufenglee are you able to take a look?

github-actions[bot] commented 7 months ago

This issue has been automatically marked as stale due to inactivity and will be closed in 30 days if no further activity occurs. If further support is needed, please provide an update and/or more details.