pytorch / TensorRT

PyTorch/TorchScript/FX compiler for NVIDIA GPUs using TensorRT
https://pytorch.org/TensorRT
BSD 3-Clause "New" or "Revised" License
2.6k stars 350 forks source link

❓ [Question] How to load a TRT_Module in python environment on Windows which has been compiled on C++ Windows ? #1253

Closed noman-anjum-retro closed 1 year ago

noman-anjum-retro commented 2 years ago

❓ Question

I have compiled torch_trt module using libtorch on C++ windows platform. This module is working perfectly on C++ for inference, however I want to use it in Python program on windows platform. How to load this module on python?

When I tried to load it with torch.jit.load() or torch.jit.load() it is throwing following error:

`File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\serialization.py:711, in load(f, map_location, pickle_module, pickle_load_args) 707 warnings.warn("'torch.load' received a zip file that looks like a TorchScript archive" 708 " dispatching to 'torch.jit.load' (call 'torch.jit.load' directly to" 709 " silence this warning)", UserWarning) 710 opened_file.seek(orig_position) --> 711 return torch.jit.load(opened_file) 712 return _load(opened_zipfile, map_location, pickle_module, pickle_load_args) 713 return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)

File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\jit_serialization.py:164, in load(f, map_location, _extra_files) 162 cpp_module = torch._C.import_ir_module(cu, str(f), map_location, _extra_files) 163 else: --> 164 cpp_module = torch._C.import_ir_module_from_buffer( 165 cu, f.read(), map_location, _extra_files 166 ) 168 # TODO: Pretty sure this approach loses ConstSequential status and such 169 return wrap_cpp_module(cpp_module)

RuntimeError: Unknown type name 'torch.torch.classes.tensorrt.Engine': File "code/torch/movinets/models.py", line 4 parameters = [] buffers = [] torch___movinets_models_MoViNet_trtengine : torch__.torch.classes.tensorrt.Engine


  def forward(self_1: __torch__.movinets.models.MoViNet_trt,
    input_0: Tensor) -> Tensor:`

## What you have already tried

Since torch_trt is not supported for Python on windows I picked `libtorchtrt_runtime.so` from linux `python3.8/site-packages/torch_tensorrt/lib/libtorchtrt_runtime.so` path and loaded on python on windows through torch.ops.load_library(). However it throws another error

`File "\video_play.py", line 189, in get_torch_tensorrt_converted_model torch.ops.load_library("libtorchtrt_runtime.so") File "C:\Users\NomanAnjum\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\_ops.py", line 255, in load_library ctypes.CDLL(path) File "C:\Users\NomanAnjum\AppData\Local\Programs\Python\Python310\lib\ctypes\__init__.py", line 374, in __init__ self._handle = _dlopen(self._name, mode) OSError: [WinError 193] %1 is not a valid Win32 application`

## Environment

Windows 11

CPU : i9-11980HK x86-64

GPU : RTX 3080 Mobile

Cuda : 11.5.2

Cudnn : 8.3.1

Libtorch : 1.11

Tensor_RT : 8.4.1.5

Visual Studio 2019

Python 3.10,3.8

#### Is there a way to load it in python??
narendasan commented 2 years ago

@andi4191 can you take a look at this as well when you look at the other windows issue?

andi4191 commented 2 years ago

@noman-anjum-retro: Don't you need to load torchtrt_runtime.dll using python on windows?

noman-anjum-retro commented 2 years ago

@andi4191 I'm not sure how to load it. When I tried to load it via this code:

`import torch

import ctypes

hllDll = ctypes.WinDLL ("D:/Codes/RetroActivity/experiments/src/exploration/action_recognition/torchtrt_runtime.dll")

print("DLL Loaded")

torch.ops.load_library("D:/Codes/RetroActivity/experiments/src/exploration/action_recognition/torchtrt_runtime.dll")

model = torch.jit.load(r"D:\NewTRTModel.ts")`

it throwed an error:

hllDll = ctypes.WinDLL ("D:/Codes/RetroActivity/experiments/src/exploration/action_recognition/torchtrt_runtime.dll") File "C:\Users\NomanAnjum\AppData\Local\Programs\Python\Python310\lib\ctypes\__init__.py", line 374, in __init__ self._handle = _dlopen(self._name, mode) FileNotFoundError: Could not find module 'D:\Codes\RetroActivity\experiments\src\exploration\action_recognition\torchtrt_runtime.dll' (or one of its dependencies). Try using the full path with constructor syntax.

Since path is correct it's definitely some other issue

noman-anjum-retro commented 2 years ago

update: I checked all the dependent files of torchtrt_runtime.dll and I found some more libraries with it. I got some of them from TensorRT and some from libtorch. Now the error while loading the dll has changed to:

File "D:\Codes\RetroActivity\experiments\src\exploration\action_recognition\video_play.py", line 198, in get_torch_tensorrt_converted_model hllDll = ctypes.WinDLL(r"D:\Codes\RetroActivity\experiments\src\exploration\action_recognition\torchtrt_runtime.dll") File "C:\Users\NomanAnjum\AppData\Local\Programs\Python\Python310\lib\ctypes\__init__.py", line 374, in __init__ self._handle = _dlopen(self._name, mode) OSError: [WinError 1114] A dynamic link library (DLL) initialization routine failed

hassan11196 commented 2 years ago

@noman-anjum-retro did you get it working? facing the same issue here,

noman-anjum-retro commented 2 years ago

No, still waiting for @narendasan or @andi4191 help

andi4191 commented 2 years ago

@noman-anjum-retro: Can you try this? I added the option in torchtrtc to support custom torch op or torch_tensorrt converter. I have used windows.h header file and LoadLibrary for loading the symbol tables from a dll file.

https://github.com/pytorch/TensorRT/blob/72457277e80247d71a3bf9737d98f08b65b217d7/cpp/bin/torchtrtc/main.cpp#L27

noman-anjum-retro commented 2 years ago

@andi4191 What's the solution you are suggesting? Should I recompile my TRT module on C++ with this change and try to load it in python?

narendasan commented 2 years ago

Yes, the fix was merged into master so go ahead and try it and see if fixes your issue

noman-anjum-retro commented 2 years ago

Hello @andi4191 , no it didn't help it popped the same errors in series mentioned above.

noman-anjum-retro commented 2 years ago

Progress:

Hello @andi4191 @narendasan, I tried loading compiled TRT module on C++ and python. On C++ it worked fine on Debug mode, but on C++ Release mode and In Python it throwed following error:

RuntimeError:
Unknown type name 'torch.torch.classes.tensorrt.Engine':
File "code/torch/movinets/models.py", line 4
parameters = []
buffers = []
_torch___movinets_models_MoViNet_trt_engine : torch.torch.classes.tensorrt.Engine
< -- HERE 
 def forward(self_1: torch.movinets.models.MoViNet_trt,
input_0: Tensor) -> Tensor:

However when I loaded torchtrt_runtime.dll on C++ release with following code and it worked:

HMODULE hLib = LoadLibrary(TEXT("torchtrt_runtime"));
    if (hLib == NULL) {
        std::cerr << "Library torchtrt_runtime.dll not found" << std::endl;
        exit(1);
    }

This makes it clear that running on windows need torchtrt_runtime.dll to be loaded. However when I'm trying to load it in python with following code:

torch.ops.load_library("/src/exploration/action_recognition/torchtrt_runtime.dll")

or

import ctypes
    #hllDll = ctypes.WinDLL("/src/exploration/action_recognition/torchtrt_runtime.dll")

Library is not getting loaded and throwing following error:

hllDll = ctypes.WinDLL ("/src/exploration/action_recognition/torchtrt_runtime.dll") File "C:\Users\NomanAnjum\AppData\Local\Programs\Python\Python310\lib\ctypes\__init__.py", line 374, in __init__ self._handle = _dlopen(self._name, mode) FileNotFoundError: Could not find module '\src\exploration\action_recognition\torchtrt_runtime.dll' (or one of its dependencies). Try using the full path with constructor syntax.

I then used dependancy checker for torchtrt_runtime.dll and added all dependant dlls to same folder and error has changed to: File "\src\exploration\action_recognition\video_play.py", line 198, in get_torch_tensorrt_converted_model hllDll = ctypes.WinDLL(r"\src\exploration\action_recognition\torchtrt_runtime.dll") File "C:\Users\NomanAnjum\AppData\Local\Programs\Python\Python310\lib\ctypes\__init__.py", line 374, in __init__ self._handle = _dlopen(self._name, mode) OSError: [WinError 1114] A dynamic link library (DLL) initialization routine failed

I have this strong feeling that if I can load torchtrt_runtime.dll it will run.

Please help me on this

andi4191 commented 2 years ago

It seems to be complaining about the FileNotFound

IIRC, in Windows, the paths are mentioned as:

\\src\\exploration\\action_recognition\\torchtrt_runtime.dll Can you try above and share your observation?
noman-anjum-retro commented 2 years ago

No it didn't work. Can you please try to load a TRT on your side, maybe you'll catch something that I am missing. It's weird that it's loading in C++ and not in Python

noman-anjum-retro commented 2 years ago

Update:

I tried loading torchtrt_runtime.dll without importing torch in python and it loaded gracefully. Then I tried importing torch, but it failed with same error [WinError 1114] A dynamic link library (DLL) initialization routine failed. It seems that Python is unable to load both torchtrt_runtime.dll and torch at the same time, first operation gets completed and second one fails.

When torchtrt_runtime is loaded prior to torch:

Traceback (most recent call last):
  File "loadDLL.py", line 5, in <module>
    import torch
  File "C:\Users\NomanAnjum\anaconda3\envs\py37\lib\site-packages\torch\__init__.py", line 129, in <module>
    raise err
OSError: [WinError 1114] A dynamic link library (DLL) initialization routine failed. Error loading "C:\Users\NomanAnjum\anaconda3\envs\py37\lib\site-packages\torch\lib\torch_cpu.dll" or one of its dependencies.

When torch is imported prior to loading torchtrt_runtime.dll

Traceback (most recent call last):
  File "loadDLL.py", line 7, in <module>
    hllDll = ctypes.CDLL(r"torchtrt_runtime.dll")
  File "C:\Users\NomanAnjum\anaconda3\envs\py37\lib\ctypes\__init__.py", line 364, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: [WinError 1114] A dynamic link library (DLL) initialization routine failed
andi4191 commented 2 years ago

Hi @noman-anjum-retro,

I think the problem is with the mode while you are trying to load the symbol tables.

I tried following and is working:

import ctypes
import torch

handle = ctypes.WinDLL("<Path to torchtrt artifacts>\\torchtrt_runtime.dll", winmode=1)
print(handle)
...

For quick reference: https://docs.microsoft.com/en-us/windows/win32/api/libloaderapi/nf-libloaderapi-loadlibraryexa https://github.com/python/cpython/blob/3.9/Lib/ctypes/__init__.py#L358

Additionally, can you also check if the torch installed on your machine is compatible with the torch-tensorrt?

noman-anjum-retro commented 2 years ago

Thanks for the help, I tried it and .dll got loaded along with torch, however torch.jit.load() throwed the same error as in start:

RuntimeError: Unknown type name 'torch.torch.classes.tensorrt.Engine': File "code/torch/movinets/models.py", line 4 parameters = [] buffers = [] _torch___movinets_models_MoViNet_trt_engine : torch.torch.classes.tensorrt.Engine < -- HERE def forward(self_1: torch.movinets.models.MoViNet_trt, input_0: Tensor) -> Tensor:

then I imported tensorrt in python, and this error got vanished but now the python is quitting on loading without any error or exception:

`try:

torch.jit.load("NewTRTModel843_80.ts")

print("Success")

except Exception as e:

print(e)`

Print statement is not getting executed neither any exception or error.It happens even without loading the .dll

noman-anjum-retro commented 2 years ago

@narendasan @andi4191 any update on this?

LukeRoss00 commented 2 years ago

Hi all, I'm also trying to make Torch-TensorRT work with Python on Windows. I was able to compile TensorRT from source using CMake and to reproduce the following as posted by @andi4191 (I had to copy over the .dll dependencies as suggested by @noman-anjum-retro):

import ctypes
import torch

handle = ctypes.WinDLL("<Path to torchtrt artifacts>\\torchtrt_runtime.dll", winmode=1)
print(handle)

However I'm not sure what I'm supposed to do with the handle returned by WinDLL. My next objective is to call any function from the torch_tensorrt module, like torch_tensorrt.compile, but import torch_tensorrt reports ModuleNotFoundError: No module named 'torch_tensorrt' (I guess it's because the module was compiled from source and not installed via pip), and handle.compile reports AttributeError: function 'compile' not found.

Is there any obvious step that I'm missing?

EDIT: I'm not an expert on this, but I guess there should be a custom version of setup.py to wrap the .dll into a package that Python can recognize when I run python setup.py install. The existing setup.py in the source tree is obviously written to work only with Linux, is there a special version of it for the Windows build that I'm not finding? Or am I supposed to write my own?

narendasan commented 2 years ago

@narendasan @andi4191 any update on this?

torch.torch.classes.tensorrt.Engine stands out as kinda weird, id expect this to just be torch.classes.tensorrt.Engine. Is it possible to upload the compiled ts module so we can take a look?

narendasan commented 2 years ago

Hi all, I'm also trying to make Torch-TensorRT work with Python on Windows. I was able to compile TensorRT from source using CMake and to reproduce the following as posted by @andi4191 (I had to copy over the .dll dependencies as suggested by @noman-anjum-retro):

import ctypes
import torch

handle = ctypes.WinDLL("<Path to torchtrt artifacts>\\torchtrt_runtime.dll", winmode=1)
print(handle)

However I'm not sure what I'm supposed to do with the handle returned by WinDLL. My next objective is to call any function from the torch_tensorrt module, like torch_tensorrt.compile, but import torch_tensorrt reports ModuleNotFoundError: No module named 'torch_tensorrt' (I guess it's because the module was compiled from source and not installed via pip), and handle.compile reports AttributeError: function 'compile' not found.

Is there any obvious step that I'm missing?

EDIT: I'm not an expert on this, but I guess there should be a custom version of setup.py to wrap the .dll into a package that Python can recognize when I run python setup.py install. The existing setup.py in the source tree is obviously written to work only with Linux, is there a special version of it for the Windows build that I'm not finding? Or am I supposed to write my own?

So the steps laid out above are for runtime execution of compiled modules in python, not compiling modules in python. It assumes you compiled the module using the C++ API or torchtrtc since we haven't had time to get the bindings working for Python on Windows.

I think the best way to do this is not really some post compilation monkey patching but overhauling the setup.py to use cmake in addition to bazel.

You can see the build process for setup.py + bazel here: https://github.com/pytorch/TensorRT/blob/b1db33a06fe6e49004405431678946e9e8248ba8/py/setup.py#L118

Basically the steps are build the core library, then copy it into the python package tree, and use that to build the python bindings.

I would assume that the steps for cmake (and therefore windows) would be similar.

LukeRoss00 commented 2 years ago

Thank you for the hint about using torchtrtc to compile the model offline, which got me a step further. Now I seem to be stuck at the same point as @noman-anjum-retro :

trt_model = torch.jit.load("efficientnet_b0_traced_trt.ts")

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "...\AppData\Local\Programs\Python\Python38\lib\site-packages\torch\jit\_serialization.py", line 162, in load
    cpp_module = torch._C.import_ir_module(cu, str(f), map_location, _extra_files)
RuntimeError:
Unknown type name '__torch__.torch.classes.tensorrt.Engine':
  File "code/__torch__/timm/models/efficientnet.py", line 4
  __parameters__ = []
  __buffers__ = []
  __torch___timm_models_efficientnet_EfficientNet_trt_engine_ : __torch__.torch.classes.tensorrt.Engine
                                                                ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
  def forward(self_1: __torch__.timm.models.efficientnet.EfficientNet_trt,
    input_0: Tensor) -> Tensor:

The compiled module is attached: I'm trying to repro this example from NVIDIA. In my case, loading the .dll doesn't seem to have any effect: the error message is exactly the same whether I don't use WinDLL or whether I do the following

>>> import ctypes
>>> handle = ctypes.WinDLL("C:\\...\\NVIDIA\\TensorRT\\out\\install\\x64-Release\\bin\\torchtrt_runtime.dll", winmode=1)
>>> print (handle)
<WinDLL 'C:\...\NVIDIA\TensorRT\out\install\x64-Release\bin\torchtrt_runtime.dll', handle 7ffb2fae0000 at 0x1936fca0af0>

before trying to load the compiled model in Python.

noman-anjum-retro commented 2 years ago

@narendasan please find compiled module here. This was compiled with tensorrt 8.4.3

LukeRoss00 commented 2 years ago

Opening the compiled modules (they're just zip archives) I can see that the "___torch_mangle_[0-9]+" information is being swallowed up by the torchtrtc compiler when running under Windows. Could this be what makes the compiled file unparseable by the loader?

noman-anjum-retro commented 2 years ago

any success with it @LukeRoss00 ?

LukeRoss00 commented 2 years ago

No, honestly I've given up trying to run this on Windows.

github-actions[bot] commented 1 year ago

This issue has not seen activity for 90 days, Remove stale label or comment or this will be closed in 10 days