FasterRCNN to ONNX model

Finniu commented 4 years ago

Hi there, I tried to convert a fasterrcnn model to onnx format, and followed the instruction from test/test_onnx.py https://github.com/pytorch/vision/blob/master/test/test_onnx.py.

Here is my code: model=models.detection.faster_rcnn.fasterrcnn_resnet50_fpn(pretrained=True,min_size=800,max_size=1333) image=cv2.imread("test.jpg") image=cv2.resize(image,(1333,800))
image1 = Image.fromarray(cv2.cvtColor(image.copy(),cv2.COLOR_BGR2RGB)) image_tensor=to_tensor(image1) model.eval() onnx_io = io.BytesIO() torch.onnx.export(model, [image_tensor], "faster_rcnn.onnx",do_constant_folding=True, opset_version=_onnx_opset_version)

I have succeed convert the model with the above code, however, when I tried to convert the tensor and model to cuda tensor with .to(device), there is an error that isline 359, in _get_top_n_idx r.append(top_n_idx + offset) RuntimeError: expected device cuda:0 but got device cpu. I don't know how to solve it.

Please help me with that.

Cheers!

fmassa commented 4 years ago

Hi,

Sorry for the delay in replying.

My advise would be to make sure you convert your inputs and model to CUDA before exporting to ONNX, this is the safest way.

So it would look something like:

model=models.detection.faster_rcnn.fasterrcnn_resnet50_fpn(pretrained=True,min_size=800,max_size=1333)
image=cv2.imread("test.jpg")
image=cv2.resize(image,(1333,800))
image1 = Image.fromarray(cv2.cvtColor(image.copy(),cv2.COLOR_BGR2RGB))
image_tensor=to_tensor(image1)
model.eval()
model.cuda()
image_tensor = image_tensor.cuda()
# just to be safe, run it once to initialize all buffers
out = model([image_tensor])
# now export it
onnx_io = io.BytesIO()
torch.onnx.export(model, [image_tensor], "faster_rcnn.onnx",do_constant_folding=True, opset_version=_onnx_opset_version)

Let me know if you still have issues.

janstrohbeck commented 4 years ago

I also get this error with both FasterRCNN and MaskRCNN, and I'm sure that the model and input tensor are on the GPU. I also run the model once before exporting. Exporting with device = 'cpu' works. It's not specific to ONNX export, the error also appears just by trying to torch.jit.trace the model.

img = Image.open(sys.argv[1]).convert('RGB')
img = np.array(img)

device = 'cuda'
model = torchvision.models.detection.maskrcnn_resnet50_fpn(pretrained=True, min_size=800, max_size=800)
model.eval()
model.to(device)

img_ = transforms.ToTensor()(img)
img_ = img_.to(device)

out = model([img_])

torch.onnx.export(model, ([img_],), "/tmp/mask_rcnn.onnx", verbose=True, do_constant_folding=True, opset_version=11)

  File "segment_image.py", line 119, in <module>
    torch.onnx.export(model, ([img_],), "/tmp/mask_rcnn.onnx", verbose=True, do_constant_folding=True, opset_version=11)
  File "/opt/env/lib/python3.7/site-packages/torch/onnx/__init__.py", line 156, in export
    custom_opsets)
  File "/opt/env/lib/python3.7/site-packages/torch/onnx/utils.py", line 67, in export
    custom_opsets=custom_opsets)
  File "/opt/env/lib/python3.7/site-packages/torch/onnx/utils.py", line 466, in _export
    fixed_batch_size=fixed_batch_size)
  File "/opt/env/lib/python3.7/site-packages/torch/onnx/utils.py", line 319, in _model_to_graph
    graph, torch_out = _trace_and_get_graph_from_model(model, args, training)
  File "/opt/env/lib/python3.7/site-packages/torch/onnx/utils.py", line 276, in _trace_and_get_graph_from_model
    torch.jit._get_trace_graph(model, args, _force_outplace=False, _return_inputs_states=True)
  File "/opt/env/lib/python3.7/site-packages/torch/jit/__init__.py", line 282, in _get_trace_graph
    outs = ONNXTracedModule(f, _force_outplace, return_inputs, _return_inputs_states)(*args, **kwargs)
  File "/opt/env/lib/python3.7/site-packages/torch/nn/modules/module.py", line 539, in __call__
    result = self.forward(*input, **kwargs)
  File "/opt/env/lib/python3.7/site-packages/torch/jit/__init__.py", line 365, in forward
    self._force_outplace,
  File "/opt/env/lib/python3.7/site-packages/torch/jit/__init__.py", line 352, in wrapper
    outs.append(self.inner(*trace_inputs))
  File "/opt/env/lib/python3.7/site-packages/torch/nn/modules/module.py", line 537, in __call__
    result = self._slow_forward(*input, **kwargs)
  File "/opt/env/lib/python3.7/site-packages/torch/nn/modules/module.py", line 523, in _slow_forward
    result = self.forward(*input, **kwargs)
  File "/opt/env/lib/python3.7/site-packages/torchvision/models/detection/generalized_rcnn.py", line 70, in forward
    proposals, proposal_losses = self.rpn(images, features, targets)
  File "/opt/env/lib/python3.7/site-packages/torch/nn/modules/module.py", line 537, in __call__
    result = self._slow_forward(*input, **kwargs)
  File "/opt/env/lib/python3.7/site-packages/torch/nn/modules/module.py", line 523, in _slow_forward
    result = self.forward(*input, **kwargs)
  File "/opt/env/lib/python3.7/site-packages/torchvision/models/detection/rpn.py", line 472, in forward
    boxes, scores = self.filter_proposals(proposals, objectness, images.image_sizes, num_anchors_per_level)
  File "/opt/env/lib/python3.7/site-packages/torchvision/models/detection/rpn.py", line 379, in filter_proposals
    top_n_idx = self._get_top_n_idx(objectness, num_anchors_per_level)
  File "/opt/env/lib/python3.7/site-packages/torchvision/models/detection/rpn.py", line 359, in _get_top_n_idx
    r.append(top_n_idx + offset)
RuntimeError: expected device cuda:0 but got device cpu

>>> import torchvision; torchvision.__version__
'0.5.0.dev20200108+cu100'
>>> import torch; torch.__version__
'1.5.0.dev20200109+cu100'

fmassa commented 4 years ago

@janstrohbeck Thanks for the detailed report!

After digging a bit further, there seems to be a couple of issues. The first one is that torch.onnx.operators.shape_as_tensor doesn't take the device of the original tensor into account, so that https://github.com/pytorch/vision/blob/61763fa955ef74077a1d3e1aa5da36f7c606943a/torchvision/models/detection/rpn.py#L21 is always a CPU tensor, and the second one is that once we fix the above we also need to fix https://github.com/pytorch/vision/blob/61763fa955ef74077a1d3e1aa5da36f7c606943a/torchvision/models/detection/rpn.py#L24-L26 to use the device of the original tensor.

@lara-hdr do you think we should change shape_as_tensor in pytorch ONNX to take the device of the original tensor into account as well? Otherwise we can just add casts in the model right away as a workaround solution.

@janstrohbeck @Finniu in the mean time, please convert the model to CPU before exporting to ONNX.

nikhilshinday commented 4 years ago

Unsure of whether or not this is coincidental, but I successfully exported the model to ONNX while the model was on the CPU. When serving the ONNX model in a TensorRT server, the model mostly evaluates on the CPU even though the server supposedly loads the model onto the GPU. I know this because, while evaluating, my CPU goes to almost 100% while my GPU utilization remains below 10%.

Could this be related? Without understanding too much about how torch.onnx.export method works, it's unclear to me whether evaluating the model on the CPU during tracing leads to the ONNX model executing on the CPU.

fmassa commented 4 years ago

@nikhilshinday I don't know the answer to your question, maybe @lara-hdr knows it?

lara-hdr commented 4 years ago

@nikhilshinday, torch.onnx.export() does not track if the model was on CPU/GPU when exported, and the exported ONNX model should run on the device you specify regardless of if it was running on CPU/GPU when exported. I am not sure why CPU utilization goes up when you load the ONNX model on GPU; do you know if the engine you are using the run the ONNX model fully supports running it on GPU?

Finniu commented 4 years ago

Hi there, any updates?

raviv commented 4 years ago

Although exporting the model in GPU mode fails, exporting it in CPU mode and then loading it into GPU-enabled onnx runtime (using onnxruntime-gpu PyPi package) works just fine. I'm using torch==1.5.0 and torchvision==0.6.0

Finniu commented 4 years ago

Although exporting the model in GPU mode fails, exporting it in CPU mode and then loading it into GPU-enabled onnx runtime (using onnxruntime-gpu PyPi package) works just fine. I'm using torch==1.5.0 and torchvision==0.6.0

@raviv Thanks, I will try. Another question is have you tried to convert the onnx model to tensorrt?

raviv commented 4 years ago

@Finniu No, I don't use tensorrt.

BTW, if you want to export maskrcnn_resnet50_fpn so that it accepts any input size, do:

dynamic_axes = {'input': [0, 2, 3], 'output': [0, 2, 3]}
torch.onnx.export(net,  ..., dynamic_axes=dynamic_axes)

danilopeixoto commented 4 years ago

Exporting Faster R-CNN model:

...
device = torch.device('cuda')

model.to(device)
input = torch.randn((1, 3, 600, 600), device = device)

torch.onnx.export(model, input, 'model.onnx', opset_version = 11)
...

Error:

/usr/local/lib/python3.6/dist-packages/torchvision/models/detection/rpn.py in _get_top_n_idx(self, objectness, num_anchors_per_level)
    372                 pre_nms_top_n = min(self.pre_nms_top_n(), num_anchors)
    373             _, top_n_idx = ob.topk(pre_nms_top_n, dim=1)
--> 374             r.append(top_n_idx + offset)
    375             offset += num_anchors
    376         return torch.cat(r, dim=1)

RuntimeError: expected device cuda:0 but got device cpu

I exported successfully in CPU mode. GPU not supported?

raviv commented 4 years ago

@danilopeixoto This error has been reported above. My solution was to export using CPU. On inference you can use CPU, GPU, TensorRT, etc. depending on onnx runtime you use. I'm using this one https://microsoft.github.io/onnxruntime/ and very happy with it.

veer5551 commented 4 years ago

Hello Team,

I am trying to convert the FasterRCNN to ONNX. I was able to successfully export to ONNX, but I am not able to infer any image. I tried to export the model to have a dynamic input size for image as well. Still with no luck.

I am unable to get clear instructions on what should be image data input to the model. I think I am messing up somewhere in the input to the model

Below is the code I am trying to implement to export and infer.

# **This piece of code is implemented from the test_onnx.py file**

def get_image_from_url(url, size=None):
    import requests
    from PIL import Image
    from io import BytesIO
    from torchvision import transforms as T

    data = requests.get(url)
    image = Image.open(BytesIO(data.content)).convert("RGB")

    if size is None:
        size = (300, 200)
    image = image.resize(size, Image.BILINEAR)
#     plt.imshow(image)

#     img = Image.open(img_path) # Load the image

    transform = T.Compose([T.ToTensor()]) # Defing PyTorch Transform
    image = transform(image)

#     to_tensor = transforms.ToTensor()
    return image

def get_test_images():
    image_url = "http://farm3.staticflickr.com/2469/3915380994_2e611b1779_z.jpg"
    image = get_image_from_url(url=image_url, size=(100, 320))

    image_url2 = "https://pytorch.org/tutorials/_static/img/tv_tutorial/tv_image05.png"
    image2 = get_image_from_url(url=image_url2, size=(250, 380))

    images = image
    test_images = [image2]
    return images, test_images

images, test_images = get_test_images()

dummy_input = torch.randn(1, 3, 224, 224)

model_name = r"fasterrcnn_resnet50_fpn_dynamic_try4_with_image_input"
final_path = model_name + ".onnx"
dynamic_axes = {'input': [0, 2, 3], 'output': [0, 2, 3]}

torch.onnx.export(model, images.unsqueeze(0),final_path ,
                  do_constant_folding=True, opset_version=11,
                  dynamic_axes=dynamic_axes, input_names=['input'], output_names=['output'])

Below is the code I am using to use the onnx model to infer.

folder = my path

model_name = r"fasterrcnn_resnet50_fpn_dynamic_try4_with_image_input.onnx"

final_path = os.path.join(folder,model_name)

# Load the ONNX model
model_onnx = onnx.load(final_path)

# Check that the IR is well formed
onnx.checker.check_model(model_onnx)

# Print a human readable representation of the graph
print(onnx.helper.printable_graph(model_onnx.graph))

import onnxruntime as rt

sess = rt.InferenceSession(final_path)

input_name = sess.get_inputs()[0].name
output_name = sess.get_outputs()[0].name

pred = sess.run([output_name], {input_name:images})

This crashes my kernel and restarts it. I don't know why? I think I am messing up with the input type and dimensions.

Could you please help me to get this onnx model up and running!!

Attached is the graph log for the converted model. Let me know if I am missing somewhere in the conversion procedure as well. torchvision_frcnn_try4_dynamic_onnx_log.docx

Thanks a lot!!

danilopeixoto commented 4 years ago

Torchvision Faster R-CNN model does not support dynamic input shape according to documentation.

Faster R-CNN is exportable to ONNX for a fixed batch size with inputs images of fixed size.

fmassa commented 4 years ago

@danilopeixoto Dynamic shape support should now work on ONNX if using a very recently torchvision nightly

veer5551 commented 4 years ago

Hey @danilopeixoto, @fmassa Thank you for the suggestions. But I am still not able to get any output either for a fix or dynamic input image.

Could you please have a look into the code let me know where am I going wrong?

Also, I tried to tun the test_faster_rcnn from the latest test_onnx.py (here) file and I got the following error. As I am a newbie here, I don't exactly know what this error means:

log:

>>> test_object = test_onnx.ONNXExporterTester()
>>> test_object.test_faster_rcnn()
C:\Users\msjmf59\Documents\VirtualEnvironments\pytorch_gpu2\Lib\site-packages\torch\nn\functional.py:2854: UserWarning: The default behavior for interpolate/upsample with float scale_factor will change in 1.6.0 to align with other frameworks/libraries, and use scale_factor directly, instead of relying on the computed output size. If you wish to keep the old behavior, please set recompute_scale_factor=True. See the documentation of nn.Upsample for details.
  warnings.warn("The default behavior for interpolate/upsample with float scale_factor will change "
..\torch\csrc\utils\python_arg_parser.cpp:756: UserWarning: This overload of nonzero is deprecated:
        nonzero(Tensor input, *, Tensor out)
Consider using one of the following signatures instead:
        nonzero(Tensor input, *, bool as_tuple)
..\aten\src\ATen\native\BinaryOps.cpp:81: UserWarning: Integer division of tensors using div or / is deprecated, and in a future release div will perform true division as in Python 3. Use true_divide or floor_divide (// in Python) instead.
C:\Users\msjmf59\Documents\VirtualEnvironments\pytorch_gpu2\Lib\site-packages\torchvision\models\detection\rpn.py:164: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
  torch.tensor(image_size[1] / g[1], dtype=torch.int64, device=device)] for g in grid_sizes]
C:\Users\msjmf59\Documents\VirtualEnvironments\pytorch_gpu2\Lib\site-packages\torch\tensor.py:467: RuntimeWarning: Iterating over a tensor might cause the trace to be incorrect. Passing a tensor of different shape won't change the number of iterations executed (and might lead to errors or silently give incorrect results).
  'incorrect results).', category=RuntimeWarning)
C:\Users\msjmf59\Documents\VirtualEnvironments\pytorch_gpu2\Lib\site-packages\torchvision\ops\boxes.py:117: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
  boxes_x = torch.min(boxes_x, torch.tensor(width, dtype=boxes.dtype, device=boxes.device))
C:\Users\msjmf59\Documents\VirtualEnvironments\pytorch_gpu2\Lib\site-packages\torchvision\ops\boxes.py:119: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
  boxes_y = torch.min(boxes_y, torch.tensor(height, dtype=boxes.dtype, device=boxes.device))
C:\Users\msjmf59\Documents\VirtualEnvironments\pytorch_gpu2\Lib\site-packages\torchvision\models\detection\transform.py:217: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
  for s, s_orig in zip(new_size, original_size)
C:\Users\msjmf59\Documents\VirtualEnvironments\pytorch_gpu2\Lib\site-packages\torch\onnx\symbolic_opset9.py:2115: UserWarning: Exporting aten::index operator of advanced indexing in opset 11 is achieved by combination of multiple ONNX operators, including Reshape, Transpose, Concat, and Gather. If indices include negative values, the exported graph will produce incorrect results.
  "If indices include negative values, the exported graph will produce incorrect results.")
C:\Users\msjmf59\Documents\VirtualEnvironments\pytorch_gpu2\Lib\site-packages\torch\onnx\utils.py:915: UserWarning: No names were found for specified dynamic axes of provided input.Automatically generated names will be applied to each dynamic axes of input images_tensors
  'Automatically generated names will be applied to each dynamic axes of input {}'.format(key))
C:\Users\msjmf59\Documents\VirtualEnvironments\pytorch_gpu2\Lib\site-packages\torch\onnx\utils.py:915: UserWarning: No names were found for specified dynamic axes of provided input.Automatically generated names will be applied to each dynamic axes of input outputs
  'Automatically generated names will be applied to each dynamic axes of input {}'.format(key))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\msjmf59\Documents\VirtualEnvironments\pytorch_gpu2\Lib\site-packages\test_onnx.py", line 357, in test_faster_rcnn
    tolerate_small_mismatch=True)
  File "C:\Users\msjmf59\Documents\VirtualEnvironments\pytorch_gpu2\Lib\site-packages\test_onnx.py", line 49, in run_model
    self.ort_validate(onnx_io, test_inputs, test_ouputs, tolerate_small_mismatch)
  File "C:\Users\msjmf59\Documents\VirtualEnvironments\pytorch_gpu2\Lib\site-packages\test_onnx.py", line 71, in ort_validate
    torch.testing.assert_allclose(outputs[i], ort_outs[i], rtol=1e-03, atol=1e-05)
  File "C:\Users\msjmf59\Documents\VirtualEnvironments\pytorch_gpu2\Lib\site-packages\torch\testing\__init__.py", line 24, in assert_allclose
    expected = expected.expand_as(actual)
RuntimeError: The expanded size of the tensor (52) must match the existing size (54) at non-singleton dimension 0.  Target sizes: [52, 4].  Tensor sizes: [54, 4]

Thanks a lot!

fmassa commented 4 years ago

cc @neginraoof if you could have a look

finnickniu commented 4 years ago

cc @neginraoof if you could have a look

Hi @fmassa, is there any tool that I can convert the faster rcnn onnx to tensorrt? I have tried with onnx-tensorrt, but failed in nms converting. Thanks

benschreiber commented 4 years ago

Hi. I am also experiencing the original RuntimeError: expected device cuda:0 but got device cpu message when JIT tracing any rcnn model (on the same line as OP). Strangely, the first iteration through for ob in objectness.split(num_anchors_per_level, 1) succeeds, but the second one fails.

I am working on a JIT related project that requires the model to be traced on the gpu, so the export on cpu workaround does not apply to me. I am not concerned with ONNX right now, only Torchscript. Is there a timeline on a fix for this? Even guidance on how to fix this myself would be appreciated. @fmassa

danilopeixoto commented 4 years ago

Hi,

I was able to export the model to ONNX and it was working fine, but now only empty detections are returned. I've tried to downgrade the package versions, change the opset, check the code for changes.

Model inference using PyTorch still works.

Has anyone experienced this issue exporting Faster RCNN model to ONNX?

danilopeixoto commented 4 years ago

I replace the dummy input:

input_data = [torch.rand((3, 600, 600), device = cpu_device)]

with:

input_data = [torch.randn((3, 600, 600), device = cpu_device)]

and it worked.

This issue may be related to Export object detection model to ONNX:empty output by ONNX inference.

FraPochetti commented 3 years ago

I replace the dummy input:
input_data = [torch.rand((3, 600, 600), device = cpu_device)]
with:
input_data = [torch.randn((3, 600, 600), device = cpu_device)]
and it worked.

This issue may be related to Export object detection model to ONNX:empty output by ONNX inference.

Hi @danilopeixoto, do I get this right that your only change was from rand to randn? I am experiencing the same issue.

danilopeixoto commented 3 years ago

@FraPochetti Yes, that was the only change in the code.

FraPochetti commented 3 years ago

@danilopeixoto Thanks! Do you happen to have the code snippet you used by any chance? (as you probably did yourself) I tried a ton of things and nothing is really working, yours included, unfortunately. Maybe I am doing something really silly somewhere else.

Ratansairohith commented 3 years ago

I replace the dummy input:
input_data = [torch.rand((3, 600, 600), device = cpu_device)]
with:
input_data = [torch.randn((3, 600, 600), device = cpu_device)]
and it worked.

This issue may be related to Export object detection model to ONNX:empty output by ONNX inference.

Thanks @danilopeixoto it worked! Pushing model and dummy input into cpu and converting it into onnx solved my problem. Later was able to do inference on gpu with onnxruntime 'CUDAExecutionProvider'. 👍

FraPochetti commented 3 years ago

Hi @Ratansairohith great you found a way! Would you be so kind to share the entire code snippet you used to get mask_rcnn to work? Reading your comment, I just retried on my own but I don't seem to get it right. This is the Colab notebook I have put together, for reference.

Ratansairohith commented 3 years ago

Hey @FraPochetti! Yeah my code was pretty straight forward. I trained my own faster rcnn with custom dataset just llike official pytorch tutorial https://pytorch.org/tutorials/intermediate/torchvision_tutorial.html. Then i used the following code snippet to convert my torch model to onnx.

import torchvision
from torchvision.models.detection.faster_rcnn import FastRCNNPredictor
from torchvision.models.detection import FasterRCNN
from torchvision.models.detection.rpn import AnchorGenerator
import torch
num_classes = 10 
model = torchvision.models.detection.fasterrcnn_resnet50_fpn(pretrained=True)
in_features = model.roi_heads.box_predictor.cls_score.in_features
model.roi_heads.box_predictor = FastRCNNPredictor(in_features, num_classes)
model.load_state_dict(torch.load('/content/drive/MyDrive/images/fasterrcnn_resnet50_fpn_9.pth'))
model.eval()

# set device to cpu 
cpu_device = torch.device('cpu') 
x = [torch.randn((3, 384, 384), device = cpu_device)]
model.to(cpu_device)

# finally convert pytorch model to onnx 
torch.onnx.export(model, x , "faster_rcnn_9.onnx", verbose=True, do_constant_folding=True, opset_version=11)

FraPochetti commented 3 years ago

Thanks a lot for the reply. I see you are using faster_rcnn. That worked for me too. mask_rcnn is the one which is causing me trouble :)

cyberpedestrian commented 3 years ago

Hello @fmassa, @lara-hdr.

Thanks you for your work. I was wondering if there a way of exporting faster rcnn model without transformation layers and make 2 static output tensors (boxes and scores)? Any direction of digging would be appreciated.

Thank you kindly.

pytorch / vision

FasterRCNN to ONNX model #1706