Open Yasin40 opened 3 years ago
Hi,
Was the version of PyTorch and torchvision used to run on the second environment the same as in the machine used to export the model?
Hi,
Was the version of PyTorch and torchvision used to run on the second environment the same as in the machine used to export the model?
Thanks for attention. In second environment i use model to c++ inference with libtorch. i was tested libtorch 1.9.0 and latest(Preview nightly) cpu version. but Torch & torchvision version is: PyTorch Version: 1.9.0+cu102 Torchvision Version: 0.10.0+cu102
@eellison who would be a good POC from the team to have a look?
Also, @Yasin40 have you tried re-exporting the model in Python and running it again in C++?
Also, @Yasin40 have you tried re-exporting the model in Python and running it again in C++?
Yes. Also i tested in this way:
import sys
import time
from pathlib import Path
import torch
import torch.nn as nn
from model import initialize_model,BSConv_init
num_classes=14
device = torch.device('cpu')
model = models.mobilenet_v3_small(pretrained=False)
model = model.to(device)
checkpoint = torch.load('mobilenet_v3_small-047dcff4.pth', map_location=device) # load FP32 model
model.load_state_dict(checkpoint)
img = torch.rand(1, 3, 224, 224).to(device)
model.eval()
ts = torch.jit.trace(model, img, strict=False)
ts.save("traced_mobnet__pt_model.pt")
But got same error.
frontend related error cc @gmagogsfm
anyone can help me?
I tested resnet18
from torchvision model
instead mobilenetv3
and works.
Whats problem about mobilenetv3 and Linear
?
@Yasin40 This looks like an error on PyTorch's nn module. The reason why MobileNetV3 is only affected is because it uses the Identity module which seems to have issue.
It might be worth filing this on PyTorch core.
Anyone can help me?
@gmagogsfm can you have a look? Seems like an issue in the interpreter
I tried to write an example with Identity module at top of trunk PyTorch, the issue is not reproducible.
Could you try loading a toy module generated by this script in your environment?
import torch
class JitModule(torch.nn.Module):
def __init__(self):
super().__init__()
self.id = torch.nn.Identity()
def forward(self, x: torch.Tensor):
return self.id(x)
m = torch.jit.script(JitModule())
torch.jit.save(m, "identity_saved_module.pt")
This is the serialized code when compiled in my environment:
def forward(self: __torch__.torch.nn.modules.linear.Identity, input: Tensor) -> Tensor:
As you can see it is different from the one shown in your example:
def forward(self: __torch__.torch.nn.modules.linear.Identity) -> NoneType:
The forward
signature from your example seems wrong as it doesn't take any tensor input, but Identity requires a tensor input:
https://github.com/pytorch/pytorch/blob/master/torch/nn/modules/linear.py#L35
I tried to write an example with Identity module at top of trunk PyTorch, the issue is not reproducible.
Could you try loading a toy module generated by this script in your environment?
import torch class JitModule(torch.nn.Module): def __init__(self): super().__init__() self.id = torch.nn.Identity() def forward(self, x: torch.Tensor): return self.id(x) m = torch.jit.script(JitModule()) torch.jit.save(m, "identity_saved_module.pt")
Tanks, Yes. loaded successfully, but my problem is about using torchvision.models
and its mobilenetv3
. How can fix it?
@Yasin40 instead of using torch.jit.trace
, can you try using instead torch.jit.script
on your model?
I tried to write an example with Identity module at top of trunk PyTorch, the issue is not reproducible. Could you try loading a toy module generated by this script in your environment?
import torch class JitModule(torch.nn.Module): def __init__(self): super().__init__() self.id = torch.nn.Identity() def forward(self, x: torch.Tensor): return self.id(x) m = torch.jit.script(JitModule()) torch.jit.save(m, "identity_saved_module.pt")
Tanks, Yes. loaded successfully, but my problem is about using
torchvision.models
and itsmobilenetv3
. How can fix it?
I will try using mobilenetv3 directly to see if I can reproduce.
@Yasin40 instead of using
torch.jit.trace
, can you try using insteadtorch.jit.script
on your model?
Yes, I test torch.jit.script
and not affected.
@Yasin40 so you mean that torch.jit.script
works successfully? If that's the case, then I believe we can close this issue, as torch.jit.script
effectively replaces torch.jit.trace
whenever the model can be scripted
@Yasin40 so you mean that
torch.jit.script
works successfully? If that's the case, then I believe we can close this issue, astorch.jit.script
effectively replacestorch.jit.trace
whenever the model can be scripted
No, Doesn't work.
Maybe the reason is incompatible on torch version.I also encountered this problem because I train the model with torch 1.9.1,and deployment service with torch 1.8.0. But I solved the problem when I changed the torch to 1.10.0.---2021.11.11
@Yasin40 so you mean that
torch.jit.script
works successfully? If that's the case, then I believe we can close this issue, astorch.jit.script
effectively replacestorch.jit.trace
whenever the model can be scriptedNo, Doesn't work.
Hi, Do you have any advancement on this issue?
I faced the same problem but with the detection models in torchvision. torch.jit.trace
produces an error when exporting due to output format. torch.jit.script
exports the model successfully but the c++ lib fails to load it.
Have you solved this problem? I also exported successfully with torch.jit.script
but the c++ lib fails to load it.
terminate called after throwing an instance of 'torch::jit::ErrorReport'
what():
Unknown type name 'NoneType':
Serialized File "code/__torch__/torch/nn/modules/container.py", line 5
__buffers__ = []
training : bool
_is_full_backward_hook : NoneType
~~~~~~~~ <--- HERE
__annotations__["VEHICLE/node_history_encoder"] = __torch__.torch.nn.modules.rnn.LSTM
__annotations__["VEHICLE/node_future_encoder"] = __torch__.torch.nn.modules.rnn.___torch_mangle_0.LSTM
Maybe the reason is incompatible on torch version.I also encountered this problem because I train the model with torch 1.9.1,and deployment service with torch 1.8.0. But I solved the problem when I changed the torch to 1.10.0.---2021.11.11
@lizhi1215 Could you please tell me 1.10.0 is the version of deployment service or trained model?
🐛** C++ Inferencing using Torchscript Exported Torchvision model Erorr
I'm trying to use this approach to make my model (Mobilenetv3 small) using Torchvison models, In train and validation phase (python) worked Whiteout any problem but after saving Torchscript to use in c++ inference, got this error:
My simplified Torchscript exporting code:
this exporting script run successfully, but using c++ produce error. this is my simpilified C++ code that works for other models:
try{ this->module = torch::jit::load(ModelAddress); }catch (const c10::Error& e) { std::cerr << "error loading the model: " << e.what() << std::endl; std::exit(EXIT_FAILURE); } half_ = (device_ != torch::kCPU); this->module.to(device_); if (half_) { module.to(torch::kHalf); } torch::NoGradGuard no_grad; module.eval();
Even got error until this initializing, but my other exported models work fine at forward and ... . I'm confused and need help.Environment
env 1: System which trained and export torchscript (by above code): OS: Ubuntu 18.04.5 LTS (x86_64) GCC version: (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0 Clang version: 6.0.0-1ubuntu2 (tags/RELEASE_600/final) CMake version: version 3.10.2 Libc version: glibc-2.25
Python version: 3.6.9 (default, Jul 17 2020, 12:50:27) [GCC 8.4.0] (64-bit runtime) Python platform: Linux-5.4.0-48-generic-x86_64-with-Ubuntu-18.04-bionic Is CUDA available: True CUDA runtime version: Could not collect GPU models and configuration: GPU 0: GeForce GTX 1060 6GB Nvidia driver version: 450.66 cuDNN version: /usr/lib/x86_64-linux-gnu/libcudnn.so.7.6.5 HIP runtime version: N/A MIOpen runtime version: N/A
Versions of relevant libraries: [pip3] numpy==1.19.4 [pip3] torch==1.9.0 [pip3] torchvision==0.10.0
env 2: system which run c++ code and got error:
OS: Ubuntu 18.04.4 LTS (x86_64) GCC version: (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0 Clang version: Could not collect CMake version: version 3.10.2 Libc version: glibc-2.15
Python version: 2.7.17 (default, Jul 20 2020, 15:37:01) [GCC 7.5.0] (64-bit runtime) Python platform: Linux-5.4.0-42-generic-x86_64-with-Ubuntu-18.04-bionic Is CUDA available: N/A CUDA runtime version: Could not collect GPU models and configuration: Could not collect Nvidia driver version: Could not collect cuDNN version: Could not collect HIP runtime version: N/A MIOpen runtime version: N/A
Versions of relevant libraries: [pip3] numpy==1.18.1
Additional context