[JIT] Not supported for maskrcnn_resnet50_fpn

rbrigden commented 5 years ago

I am trying to accelerate the maskrcnn_resnet50_fpn pretrained model using JIT tracing provided by pytorch. It appears that some operations present in this model are not supported by pytorch JIT.

Are these models supposed to have JIT support officially? If not, would you be able to provide advice for a workaround?

To replicate, running:

import torch
import torchvision
model = torchvision.models.detection.maskrcnn_resnet50_fpn(pretrained=True)
model.eval()
traced_net = torch.jit.trace(model, torch.rand(1, 3,800, 800))

produces

RuntimeError: log2_vml_cpu not implemented for 'Long

Thank you.

WaterKnight1998 commented 4 years ago

@WaterKnight1998 to complement @ptrblck comment, it seems that your input is a TensorImage (which is not something that we provide in torchvision I believe) If you pass instead a list of 3d tensors, it should work.

TensorImage is just a normal Tensor obtained from fastai that just add show function.

The problem that we are finding is that after tracing the output gets changed!

You can find the concrete output here

fmassa commented 4 years ago

@WaterKnight1998 I would recommend converting the TensorImage into a Tensor before feeding the image, and making it be a list of tensors of 3 dimensions.

WaterKnight1998 commented 4 years ago

@WaterKnight1998 I would recommend converting the TensorImage into a Tensor before feeding the image, and making it be a list of tensors of 3 dimensions.

I tried using a list of 3d tensors and I am getting the strange empty dict.

({}, [{'scores': tensor([0.0570], grad_fn=<IndexBackward>), 'labels': tensor([1]), 'boxes': tensor([[165.8691, 434.1203, 527.4108, 714.6182]], grad_fn=<StackBackward>), 'masks': tensor([[[[0., 0., 0.,  ..., 0., 0., 0.],
          [0., 0., 0.,  ..., 0., 0., 0.],
          [0., 0., 0.,  ..., 0., 0., 0.],
          ...,
          [0., 0., 0.,  ..., 0., 0., 0.],
          [0., 0., 0.,  ..., 0., 0., 0.],

fmassa commented 4 years ago

@WaterKnight1998 your output seems ok to me, MaskRCNN detected only a single object, with low confidence.

I would make sure that I'm feeding the inputs in the right forward (the images should be in the range 0-1)

WaterKnight1998 commented 4 years ago

your output seems ok to me

@fmassa mask-rcnn withotuh scripting just output the second element of the tuple. Is normal that after tracing it, it returns a tuple with first element of tuple being an empty dict?

fmassa commented 4 years ago

@WaterKnight1998 yes, it is. We raise a warning in https://github.com/pytorch/vision/blob/11a39aaab5b55a3c116c2e8d8001bad94a96f99d/torchvision/models/detection/generalized_rcnn.py#L108 explaining the differences. It's a limitation of torchscript that we can't have different return types depending on the self.training, so we always return both the losses and the detections, although only one of them will be activated.

WaterKnight1998 commented 4 years ago

It's a limitation of torchscript that we can't have different return types depending on the self.training, so we always return both the losses and the detections, although only one of them will be activated.

@fmassa Thank you very much for your explanation. It gave me the intuition that I needed!

bulatnv commented 3 years ago

Hello @fmassa,

Is there any updates on this issue?

torch==1.7.1 torchaudio==0.7.2 torchvision==0.8.2

Traceback (most recent call last):
  File "D:/Projects/tester/main.py", line 62, in <module>
    torch_out = script_module(x)
  File "D:\Projects\tester\venv\lib\site-packages\torch\nn\modules\module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
RuntimeError: forward() Expected a value of type 'List[Tensor]' for argument 'images' but instead found type 'Tensor'.
Position: 1
Value: tensor([[[[-1.3924, -0.3426,  0.1565,  ..., -1.0010, -0.1127,  0.2637],
          [ 0.1392, -1.3978,  0.4600,  ..., -1.7351, -1.3514, -0.4097],
          [ 1.1242, -0.2859,  0.0956,  ..., -0.9409,  0.6421, -0.0713],
          ...,
          [ 0.4488,  0.1756,  1.9472,  ...,  1.3395,  0.0882,  0.2821],
          [ 1.2623,  0.0925, -2.4398,  ..., -0.9513, -2.2078,  1.7615],
          [-0.0645, -0.4522,  1.2193,  ..., -0.3644,  0.0360, -0.1954]],

         [[ 1.1202, -1.4459, -1.7245,  ..., -1.2972, -0.0717,  0.4818],
          [ 0.8732, -0.1661, -0.1113,  ...,  1.9476, -0.4579,  1.1956],
          [-2.1614,  0.3758, -0.7581,  ..., -1.0231, -0.8411, -0.1101],
          ...,
          [ 0.5501,  0.3279, -0.8761,  ..., -0.8433, -0.2146, -1.6229],
          [ 0.6187, -1.9583, -3.2449,  ...,  1.4666, -0.0826,  1.5495],
          [-1.4143,  0.3092, -0.3439,  ...,  0.8020, -0.5509,  0.0355]],

         [[ 0.7972,  0.5274, -1.5208,  ..., -0.6306,  0.5713, -1.0178],
          [ 0.4690,  0.6849,  0.0668,  ..., -0.5453, -1.1445,  0.2774],
          [-0.0832,  1.3775, -0.8812,  ..., -2.3852,  0.5324,  1.5018],
          ...,
          [ 0.6334,  0.4894,  0.3861,  ...,  0.9698,  1.0560, -0.8113],
          [-0.8962,  1.7035, -0.8178,  ..., -0.1556,  1.7010, -0.4338],
          [ 0.0149, -0.4869, -1.8882,  ..., -1.3715,  0.9658, -0.3530]]]])
Declaration: forward(__torch__.torchvision.models.detection.faster_rcnn.FasterRCNN self, Tensor[] images, Dict(str, Tensor)[]? targets=None) -> ((Dict(str, Tensor), Dict(str, Tensor)[]))
Cast error details: Unable to cast Python instance to C++ type (compile in debug mode for details)

fmassa commented 3 years ago

@bulatnv torchscript should be supported for MaskRCNN models, but they only support the List[Tensor] interface, and not the Tensor.

So instead of doing

model(torch.rand(1, 3, 300, 300))

do instead

model([torch.rand(3, 300, 300)])

pytorch / vision

[JIT] Not supported for maskrcnn_resnet50_fpn #1002