pytorch / vision

Datasets, Transforms and Models specific to Computer Vision
https://pytorch.org/vision
BSD 3-Clause "New" or "Revised" License
16.09k stars 6.94k forks source link

maskrcnn_resnet50_fpn use _save_for_lite_interpreter error #4386

Open vitansoz opened 3 years ago

vitansoz commented 3 years ago

🐛 Bug

To Reproduce

Steps to reproduce the behavior:

1. model = torchvision.models.detection.maskrcnn_resnet50_fpn(num_classes=24,pretrained=False).to(device) model.load_state_dict(checkpoint['model']) model.eval() 2. script_model = torch.jit.script(model) opt_model = optimize_for_mobile(script_model) opt_model._save_for_lite_interpreter("mask_rcnn_1.pt") 3. RuntimeError: torch types other than torchbind (torch.torch.classes)are not supported in lite interpreter. Workaround: instead of using arbitrary class type (class Foo()), define a pytorch class (class Foo(torch.nn.Module)). SharedScreenshot

Expected behavior

Environment

Please copy and paste the output from our environment collection script (or fill out the checklist below manually).

You can get the script and run it with:

wget https://raw.githubusercontent.com/pytorch/pytorch/master/torch/utils/collect_env.py
# For security purposes, please check the contents of collect_env.py before running it.
python collect_env.py

PyTorch version: 1.9.0+cpu Is debug build: False CUDA used to build PyTorch: None ROCM used to build PyTorch: N/A

OS: Microsoft Windows 10 专业版 GCC version: Could not collect Clang version: Could not collect CMake version: version 3.18.1 Libc version: N/A

Python version: 3.6.8 (tags/v3.6.8:3c6b436a57, Dec 24 2018, 00:16:47) [MSC v.1916 64 bit (AMD64)] (64-bit runtime) Python platform: Windows-10-10.0.19041-SP0 Is CUDA available: False CUDA runtime version: No CUDA GPU models and configuration: No CUDA Nvidia driver version: No CUDA cuDNN version: No CUDA HIP runtime version: N/A MIOpen runtime version: N/A

Versions of relevant libraries: [pip3] numpy==1.18.5 [pip3] torch==1.9.0 [pip3] torchaudio==0.9.0 [pip3] torchvision==0.10.0 [conda] Could not collect

Additional context

cc @datumbox @fmassa @vfdev-5 @pmeier

pmeier commented 3 years ago

@vitansoz The reproduction is not complete:

import torch
import torchvision
from torch.utils.mobile_optimizer import optimize_for_mobile

# What device are you working on? Guessing from the environment it is "cpu"
model = torchvision.models.detection.maskrcnn_resnet50_fpn(num_classes=24,pretrained=False)# .to(device)
# Is this checkpoint important to reproduce the error? If yes, please provide it
# model.load_state_dict(checkpoint['model'])
model.eval()
script_model = torch.jit.script(model)
opt_model = optimize_for_mobile(script_model)
# Is this file necessary to reproduce the error? If yes, please provide it 
# opt_model._save_for_lite_interpreter("mask_rcnn_1.pt")

In this state, the snippet this is passing fine but we commented the offending line so this is no surprise. So either provide all the values / files we need to reproduce the error or better yet try to compile a minimal example that is self-contained and works without other dependencies.

vitansoz commented 3 years ago

def test():

device = torch.device("cpu") #torch.device("cuda" if torch.cuda.is_available() else "cpu") #

checkpoint = torch.load("./checkpoints/model_25.pth",map_location=device)

#model = torchvision.models.detection.fasterrcnn_resnet50_fpn(num_classes=10,pretrained=False).to(device)

model = torchvision.models.detection.maskrcnn_resnet50_fpn(num_classes=24,pretrained=False).to(device)

model.load_state_dict(checkpoint['model'])

model.eval()

script_model = torch.jit.script(model)

opt_model = optimize_for_mobile(script_model)

opt_model._save_for_lite_interpreter("mask_rcnn_1.pt")

if name =='main':

test()

555

vitansoz commented 3 years ago

in addition,in the test(),I use:

device = torch.device("cpu") #torch.device("cuda" if torch.cuda.is_available() else "cpu") #

checkpoint = torch.load("./checkpoints/model_25.pth",map_location=device)

model = torchvision.models.detection.fasterrcnn_resnet50_fpn(num_classes=10,pretrained=False).to(device)

model = torchvision.models.detection.maskrcnn_resnet50_fpn(num_classes=91,pretrained=True).to(device)

model.load_state_dict(checkpoint['model'])

model.eval()

script_model = torch.jit.script(model)

opt_model = optimize_for_mobile(script_model)

opt_model._save_for_lite_interpreter("mask_rcnn_1.pt")

has same problem!!

pmeier commented 3 years ago

The new reproduction has exactly the same issue as the one before:

import torch
import torchvision
from torch.utils.mobile_optimizer import optimize_for_mobile

model = torchvision.models.detection.maskrcnn_resnet50_fpn(num_classes=91,pretrained=True)
model.eval()
script_model = torch.jit.script(model)
opt_model = optimize_for_mobile(script_model)
opt_model._save_for_lite_interpreter("mask_rcnn_1.pt")

We don't have access to the file mask_rcnn_1.pt so we cannot execute opt_model._save_for_lite_interpreter("mask_rcnn_1.pt") and in turn can't reproduce your error. Either provide that file or better yet try to come up with an example that shows the same error, but does not need to load an external file.

vitansoz commented 3 years ago

opt_model._save_for_lite_interpreter("mask_rcnn_1.pt") mean save opt_model to file,file name is "mask_rcnn_1.pt". It's not to load the file "mask_rcnn_1.pt"

please reference this link: https://pytorch.org/tutorials/recipes/mobile_interpreter.html#android

pmeier commented 3 years ago

:facepalm: My bad, sorry. Next time please post the link to the tutorial right away for an additional pointer. I was lead astray by the fact that you are using a "private" operator (_save_for_lite_interpreter has a leading underscore), but since it is used in the tutorial I'm guessing it is ok for public consumption.

datumbox commented 3 years ago

I spent a bit of time looking at it.

The issue appears not only in mask_rcnn but also in all *rcnn and ssd models. Those models do include a few classes that inherit from object instead of nn.Module (BalancedPositiveNegativeSampler, BoxCoder, Matcher, LevelMapper, etc). I tried converting them and see if the issue is resolved (see #4389, it's quick and dirty, perhaps I missed something?) but the issue persists.

The error message, definitely doesn't help to find which class is responsible. Perhaps we should involve someone from JIT to help out.

Edit: During debugging I strongly advise to use ssd300_vgg16 instead of mask_rcnn. It has less moving parts and can still reproduce the problem, so might be easier to find out what's wrong. I use @pmeier's snippet to trigger the error.

sanealytics commented 2 years ago

+1 Getting the same issue en route to converting to onnx/coreml.

RobinGRAPIN commented 2 years ago

Hello, any news about how to save a jitted RCNN model using _save_for_lite_interpreter() ? I would like to use it for mobile on iOS so this format is required, but as mentionned many classes such as ImageList aren't supported in lite interpreter. I tried to make them inherit of torch.nn.Module() but it still lead to problems

Jokestv2 commented 1 year ago

got the same problem as described by @RobinGRAPIN: I got the following error message when trying to call the '_save_for_lite_interpreter()' method of the scripted keypointrcnn_resnet50_fpn model

  File "/Users/k.zhao/anaconda3/envs/mypetsearch/lib/python3.8/site-packages/torch/jit/_script.py", line 727, in _save_for_lite_interpreter
    return self._c._save_for_mobile(*args, **kwargs)
RuntimeError: __torch__ types other than custom c++ classes (__torch__.torch.classes)are not supported in lite interpreter. Workaround: instead of using arbitrary class type (class Foo()), define a pytorch class (class Foo(torch.nn.Module)). The problematic type is: __torch__.torchvision.models.detection.image_list.ImageList