Yasin40 commented 3 years ago

🐛** C++ Inferencing using Torchscript Exported Torchvision model Erorr

I'm trying to use this approach to make my model (Mobilenetv3 small) using Torchvison models, In train and validation phase (python) worked Whiteout any problem but after saving Torchscript to use in c++ inference, got this error:

terminate` called after throwing an instance of 'torch::jit::ErrorReport'
  what():  
Unknown type name 'NoneType':
Serialized   File "code/__torch__/torch/nn/modules/linear.py", line 6
  training : bool
  _is_full_backward_hook : Optional[bool]
  def forward(self: __torch__.torch.nn.modules.linear.Identity) -> NoneType:
                                                                   ~~~~~~~~ <--- HERE
    return None
class Linear(Module):

Aborted (core dumped)

My simplified Torchscript exporting code:

import sys
import time
from pathlib import Path

import torch
import torch.nn as nn
from model import initialize_model,BSConv_init
num_classes=14
device = torch.device('cpu')
model = models.mobilenet_v3_small(pretrained=use_pretrained)
num_ftrs = model.classifier[3].in_features
model.classifier[3] = nn.Linear(num_ftrs, num_classes)
model = model.to(device)
checkpoint = torch.load('checkpoint/best_model_MobBsconv_ckpt.t7', map_location=device)  
model.load_state_dict(checkpoint['model'])

# Input
img = torch.rand(1, 3, 224, 224).to(device)
model.eval()
ts = torch.jit.trace(model, img, strict=False)
ts.save("traced_mob_bsconv_model.pt")

this exporting script run successfully, but using c++ produce error. this is my simpilified C++ code that works for other models:

try{ this->module = torch::jit::load(ModelAddress); }catch (const c10::Error& e) { std::cerr << "error loading the model: " << e.what() << std::endl; std::exit(EXIT_FAILURE); } half_ = (device_ != torch::kCPU); this->module.to(device_); if (half_) { module.to(torch::kHalf); } torch::NoGradGuard no_grad; module.eval(); Even got error until this initializing, but my other exported models work fine at forward and ... . I'm confused and need help.

Environment

env 1: System which trained and export torchscript (by above code): OS: Ubuntu 18.04.5 LTS (x86_64) GCC version: (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0 Clang version: 6.0.0-1ubuntu2 (tags/RELEASE_600/final) CMake version: version 3.10.2 Libc version: glibc-2.25

Python version: 3.6.9 (default, Jul 17 2020, 12:50:27) [GCC 8.4.0] (64-bit runtime) Python platform: Linux-5.4.0-48-generic-x86_64-with-Ubuntu-18.04-bionic Is CUDA available: True CUDA runtime version: Could not collect GPU models and configuration: GPU 0: GeForce GTX 1060 6GB Nvidia driver version: 450.66 cuDNN version: /usr/lib/x86_64-linux-gnu/libcudnn.so.7.6.5 HIP runtime version: N/A MIOpen runtime version: N/A

Versions of relevant libraries: [pip3] numpy==1.19.4 [pip3] torch==1.9.0 [pip3] torchvision==0.10.0

env 2: system which run c++ code and got error:

OS: Ubuntu 18.04.4 LTS (x86_64) GCC version: (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0 Clang version: Could not collect CMake version: version 3.10.2 Libc version: glibc-2.15

Python version: 2.7.17 (default, Jul 20 2020, 15:37:01) [GCC 7.5.0] (64-bit runtime) Python platform: Linux-5.4.0-42-generic-x86_64-with-Ubuntu-18.04-bionic Is CUDA available: N/A CUDA runtime version: Could not collect GPU models and configuration: Could not collect Nvidia driver version: Could not collect cuDNN version: Could not collect HIP runtime version: N/A MIOpen runtime version: N/A

Versions of relevant libraries: [pip3] numpy==1.18.1

Additional context

fmassa commented 3 years ago

Hi,

Was the version of PyTorch and torchvision used to run on the second environment the same as in the machine used to export the model?

Yasin40 commented 3 years ago

Hi,

Was the version of PyTorch and torchvision used to run on the second environment the same as in the machine used to export the model?

Thanks for attention. In second environment i use model to c++ inference with libtorch. i was tested libtorch 1.9.0 and latest(Preview nightly) cpu version. but Torch & torchvision version is: PyTorch Version: 1.9.0+cu102 Torchvision Version: 0.10.0+cu102

fmassa commented 3 years ago

@eellison who would be a good POC from the team to have a look?

fmassa commented 3 years ago

Also, @Yasin40 have you tried re-exporting the model in Python and running it again in C++?

Yasin40 commented 3 years ago

Also, @Yasin40 have you tried re-exporting the model in Python and running it again in C++?

Yes. Also i tested in this way:

import sys
import time
from pathlib import Path

import torch
import torch.nn as nn
from model import initialize_model,BSConv_init
num_classes=14
device = torch.device('cpu')
model = models.mobilenet_v3_small(pretrained=False)
model = model.to(device)
checkpoint = torch.load('mobilenet_v3_small-047dcff4.pth', map_location=device)  # load FP32 model
model.load_state_dict(checkpoint)
img = torch.rand(1, 3, 224, 224).to(device)
model.eval()
ts = torch.jit.trace(model, img, strict=False)
ts.save("traced_mobnet__pt_model.pt")

But got same error.

eellison commented 3 years ago

frontend related error cc @gmagogsfm

Yasin40 commented 3 years ago

anyone can help me? I tested resnet18 from torchvision model instead mobilenetv3 and works. Whats problem about mobilenetv3 and Linear?

datumbox commented 3 years ago

@Yasin40 This looks like an error on PyTorch's nn module. The reason why MobileNetV3 is only affected is because it uses the Identity module which seems to have issue.

It might be worth filing this on PyTorch core.

Yasin40 commented 3 years ago

Anyone can help me?

fmassa commented 3 years ago

@gmagogsfm can you have a look? Seems like an issue in the interpreter

gmagogsfm commented 3 years ago

I tried to write an example with Identity module at top of trunk PyTorch, the issue is not reproducible.

Could you try loading a toy module generated by this script in your environment?

import torch

class JitModule(torch.nn.Module):
    def __init__(self):
        super().__init__()
        self.id = torch.nn.Identity()

    def forward(self, x: torch.Tensor):
        return self.id(x)

m = torch.jit.script(JitModule())
torch.jit.save(m, "identity_saved_module.pt")

gmagogsfm commented 3 years ago

This is the serialized code when compiled in my environment:

  def forward(self: __torch__.torch.nn.modules.linear.Identity, input: Tensor) -> Tensor:

As you can see it is different from the one shown in your example:

def forward(self: __torch__.torch.nn.modules.linear.Identity) -> NoneType:

The forward signature from your example seems wrong as it doesn't take any tensor input, but Identity requires a tensor input: https://github.com/pytorch/pytorch/blob/master/torch/nn/modules/linear.py#L35

Yasin40 commented 3 years ago

I tried to write an example with Identity module at top of trunk PyTorch, the issue is not reproducible.

Could you try loading a toy module generated by this script in your environment?
import torch

class JitModule(torch.nn.Module):
    def __init__(self):
        super().__init__()
        self.id = torch.nn.Identity()

    def forward(self, x: torch.Tensor):
        return self.id(x)

m = torch.jit.script(JitModule())
torch.jit.save(m, "identity_saved_module.pt")

Tanks, Yes. loaded successfully, but my problem is about using torchvision.models and its mobilenetv3. How can fix it?

fmassa commented 3 years ago

@Yasin40 instead of using torch.jit.trace, can you try using instead torch.jit.script on your model?

gmagogsfm commented 3 years ago

I tried to write an example with Identity module at top of trunk PyTorch, the issue is not reproducible. Could you try loading a toy module generated by this script in your environment?
import torch

class JitModule(torch.nn.Module):
    def __init__(self):
        super().__init__()
        self.id = torch.nn.Identity()

    def forward(self, x: torch.Tensor):
        return self.id(x)

m = torch.jit.script(JitModule())
torch.jit.save(m, "identity_saved_module.pt")
Tanks, Yes. loaded successfully, but my problem is about using torchvision.models and its mobilenetv3. How can fix it?

I will try using mobilenetv3 directly to see if I can reproduce.

Yasin40 commented 3 years ago

@Yasin40 instead of using torch.jit.trace, can you try using instead torch.jit.script on your model?

Yes, I test torch.jit.script and not affected.

fmassa commented 3 years ago

@Yasin40 so you mean that torch.jit.script works successfully? If that's the case, then I believe we can close this issue, as torch.jit.script effectively replaces torch.jit.trace whenever the model can be scripted

Yasin40 commented 3 years ago

@Yasin40 so you mean that torch.jit.script works successfully? If that's the case, then I believe we can close this issue, as torch.jit.script effectively replaces torch.jit.trace whenever the model can be scripted

No, Doesn't work.

lizhi1215 commented 2 years ago

Maybe the reason is incompatible on torch version.I also encountered this problem because I train the model with torch 1.9.1,and deployment service with torch 1.8.0. But I solved the problem when I changed the torch to 1.10.0.---2021.11.11

DuyHuynhLe commented 2 years ago

@Yasin40 so you mean that torch.jit.script works successfully? If that's the case, then I believe we can close this issue, as torch.jit.script effectively replaces torch.jit.trace whenever the model can be scripted

No, Doesn't work.

Hi, Do you have any advancement on this issue? I faced the same problem but with the detection models in torchvision. torch.jit.trace produces an error when exporting due to output format. torch.jit.script exports the model successfully but the c++ lib fails to load it.

zhaowenyi7 commented 2 years ago

Have you solved this problem? I also exported successfully with torch.jit.script but the c++ lib fails to load it.

terminate called after throwing an instance of 'torch::jit::ErrorReport'
  what():  
Unknown type name 'NoneType':
Serialized   File "code/__torch__/torch/nn/modules/container.py", line 5
  __buffers__ = []
  training : bool
  _is_full_backward_hook : NoneType
                           ~~~~~~~~ <--- HERE
  __annotations__["VEHICLE/node_history_encoder"] = __torch__.torch.nn.modules.rnn.LSTM
  __annotations__["VEHICLE/node_future_encoder"] = __torch__.torch.nn.modules.rnn.___torch_mangle_0.LSTM

Eliza-and-black commented 1 year ago

Maybe the reason is incompatible on torch version.I also encountered this problem because I train the model with torch 1.9.1,and deployment service with torch 1.8.0. But I solved the problem when I changed the torch to 1.10.0.---2021.11.11

@lizhi1215 Could you please tell me 1.10.0 is the version of deployment service or trained model?

pytorch / vision