pytorch / vision

Datasets, Transforms and Models specific to Computer Vision
https://pytorch.org/vision
BSD 3-Clause "New" or "Revised" License
16.29k stars 6.96k forks source link

MaskRCNN from ScriptModule to ONNX - Unknown Type BoxCoder #5790

Open Nuno-Mota opened 2 years ago

Nuno-Mota commented 2 years ago

🐛 Describe the bug

While attempting to create an ONNX version of Maskrcnn, starting from a ScriptModule, an error occurs, indicating that __torch__.torchvision.models.detection._utils.BoxCoder is an unknown type.

MWE:

import torch
from torchvision.models.detection.mask_rcnn import maskrcnn_resnet50_fpn

model = maskrcnn_resnet50_fpn()
model.eval()
script_model = torch.jit.script(model)
example_image = torch.rand((3, 800, 1000))
torch.onnx.export(
    script_model,
    [example_image],
    "test.onnx",
    example_outputs=script_model([example_image])[1], # index 0 is losses
    opset_version = 11
)

Error traceback:

Traceback (most recent call last):
  File "/home/nmota/test_onnx.py", line 8, in <module>
    torch.onnx.export(
  File "/usr/lib/python3.9/site-packages/torch/onnx/__init__.py", line 275, in export
    return utils.export(model, args, f, export_params, verbose, training,
  File "/usr/lib/python3.9/site-packages/torch/onnx/utils.py", line 88, in export
    _export(model, args, f, export_params, verbose, training, input_names, output_names,
  File "/usr/lib/python3.9/site-packages/torch/onnx/utils.py", line 689, in _export
    _model_to_graph(model, args, verbose, input_names,
  File "/usr/lib/python3.9/site-packages/torch/onnx/utils.py", line 458, in _model_to_graph
    graph, params, torch_out, module = _create_jit_graph(model, args,
  File "/usr/lib/python3.9/site-packages/torch/onnx/utils.py", line 402, in _create_jit_graph
    module, params = torch._C._jit_onnx_list_model_parameters(freezed_m)
RuntimeError: 
Unknown type __torch__.torchvision.models.detection._utils.BoxCoder (of Python compilation unit at: 0x55bbdb787f00) encountered in handling model params. This class type does not extend __getstate__ method.:

✗ - status code 1

Unfortunately, I cannot test with a more recent version. Is this something that has been fixed recently?

Versions

PyTorch version: 1.9.0
Is debug build: False
CUDA used to build PyTorch: 11.4
ROCM used to build PyTorch: N/A

OS: Arch Linux (x86_64)
GCC version: (GCC) 11.1.0
Clang version: 12.0.1
CMake version: version 3.21.2
Libc version: glibc-2.33

Python version: 3.9.6 (default, Jun 30 2021, 10:22:16)  [GCC 11.1.0] (64-bit runtime)
Python platform: Linux-5.13.13-arch1-1-x86_64-with-glibc2.33
Is CUDA available: True
CUDA runtime version: 11.4.100
GPU models and configuration: 
GPU 0: NVIDIA TITAN X (Pascal)

Nvidia driver version: 470.63.01
cuDNN version: Probably one of the following:
/usr/lib/libcudnn.so.8.2.2
/usr/lib/libcudnn_adv_infer.so.8.2.2
/usr/lib/libcudnn_adv_train.so.8.2.2
/usr/lib/libcudnn_cnn_infer.so.8.2.2
/usr/lib/libcudnn_cnn_train.so.8.2.2
/usr/lib/libcudnn_ops_infer.so.8.2.2
/usr/lib/libcudnn_ops_train.so.8.2.2
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

Versions of relevant libraries:
[pip3] numpy==1.20.3
[pip3] torch==1.9.0
[pip3] torchvision==0.10.0a0
[conda] Could not collect

cc @neginraoof

datumbox commented 2 years ago

@Nuno-Mota I'm not too familiar with ONNX but is there a reason you are JIT-scripting the model prior to exporting it?

The intended way is to do something like:

import torch
from torchvision.models.detection import *

model = maskrcnn_resnet50_fpn(weights_backbone=None)
model.eval()
example_image = torch.rand((3, 800, 1000))
torch.onnx.export(
    model,
    [example_image],
    "test.onnx",
    opset_version=11,
)

This works fine in the latest version.

Nuno-Mota commented 2 years ago

@datumbox, the idea is to try to preserve dynamic control flow, as mentioned in the docs.

datumbox commented 2 years ago

@Nuno-Mota thanks for clarifying. As I said, I'm not too familiar with ONNX and I'm trying to understand the status of the support from the existing tests. Upon investigating, I saw that we don't test against the jit-scripted versions which means according to the quoted doc, that we actually trace the model.

@fmassa Do you have any context concerning this choice? Is this deliberate? As far as I understand the detection models are not traceable due to their loops.

medric49 commented 2 years ago

@Nuno-Mota I have the same issue, but this time with FasterRCNN. Have you found a solution, please?

RunnerZhong commented 1 year ago

Same issue in FasterRcnn convertion, any update?

adamzenith commented 1 year ago

@medric49 @RunnerZhong I found a solution if you are using the pretrained FasterRcnn avaliable from pytorch. It involves loading the scripted model, extracting the weights, applying them to the pretrained model, and then converting to onnx.

assuming your model is made from a template similar to this:

from torchvision.models.detection import fasterrcnn_resnet50_fpn_v2
from torchvision.models.detection.faster_rcnn import FastRCNNPredictor

def get_model(num_classes):
    frcnn_model = fasterrcnn_resnet50_fpn_v2(weights='COCO_V1')

    in_features = frcnn_model.roi_heads.box_predictor.cls_score.in_features
    # replace the pre-trained head with a new one
    frcnn_model.roi_heads.box_predictor = FastRCNNPredictor(in_features, num_classes)

    return frcnn_model

Then you can create an onnx exportable model by doing the following:

import torch

state_dict = torch.load("jit_model.pt").state_dict()
model = get_model(n) # n being number of output classes
model.load_state_dict(state_dict)

torch.onnx.export(model) # plus whichever params you want here.
stes commented 4 months ago

Just came across this, I can repro the same issue (same error message @Nuno-Mota reported) using torchvision 0.18.1. Any updates/further solution attempts to run torch.jit.script on the models?