kentaroy47 / benchmark-FP32-FP16-INT8-with-TensorRT

Benchmark inference speed of CNNs with various quantization methods in Pytorch+TensorRT with Jetson Nano/Xavier
MIT License
54 stars 3 forks source link

Error in torch2trt of inference segmentation.ipynb #1

Open flow-dev opened 4 years ago

flow-dev commented 4 years ago

Thanks for sharing great code!

However, I am having trouble getting an error when converting deeplabv3 models with torch2trt. --> "inference segmentation.ipynb"

The backbone alone such as resnet18 can be executed without problems. -->python3 inference_tensorrt.py

Which torch2trt installation method or jetpack version are you using?

The environment is Jetpack4.2 and Jetson nano

I installed with "Option 2 - With plugins (experimental)" referring to this site.(https://github.com/NVIDIA-AI-IOT/torch2trt)

https://github.com/NVIDIA-AI-IOT/torch2trt
Option 2 - With plugins (experimental)
To install with plugins to support some operations in PyTorch that are not natviely supported with TensorRT, call the following

sudo apt-get install libprotobuf* protobuf-compiler ninja-build
git clone https://github.com/NVIDIA-AI-IOT/torch2trt
cd torch2trt
sudo python setup.py install --plugins

Err log of "inference_segmentation.ipynb"

$ jupyter nbconvert inference_segmentation.ipynb --to python
$ python3 inference_segmentation.py 
model: fcn_resnet50
Avg execution time (ms): 0.039
Traceback (most recent call last):
  File "inference_segmentation.py", line 109, in <module>
    model_trt = torch2trt(model_w, [x])
  File "/usr/local/lib/python3.6/dist-packages/torch2trt/torch2trt.py", line 377, in torch2trt
    outputs = module(*inputs)
  File "/home/hogehoge/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "inference_segmentation.py", line 47, in forward
    return self.model(x)['out']
  File "/home/hogehoge/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/hogehoge/.local/lib/python3.6/site-packages/torchvision/models/segmentation/_utils.py", line 25, in forward
    x = F.interpolate(x, size=input_shape, mode='bilinear', align_corners=False)
  File "/usr/local/lib/python3.6/dist-packages/torch2trt/torch2trt.py", line 202, in wrapper
    converter['converter'](ctx)
  File "/usr/local/lib/python3.6/dist-packages/torch2trt/converters/interpolate/interpolate.py", line 35, in convert_interpolate
    plugin = get_interpolate_plugin(size=size, mode=mode, align_corners=align_corners)
  File "/usr/local/lib/python3.6/dist-packages/torch2trt/converters/interpolate/interpolate.py", line 11, in get_interpolate_plugin
    creator = [c for c in registry.plugin_creator_list if c.name == PLUGIN_NAME and c.plugin_namespace == 'torch2trt'][0]
IndexError: list index out of range

"inference_tensorrt.py" is no problem.

$ python3 inference_tensorrt.py 

Avg execution time (ms): 0.002
model: resnet18
Avg execution time (ms): 0.001
running fp16 models..
Avg execution time (ms): 0.001
running int8 models..
Avg execution time (ms): 0.001
Avg execution time (ms): 0.004
model: resnet34
Avg execution time (ms): 0.002
running fp16 models..
Avg execution time (ms): 0.002
running int8 models..
Avg execution time (ms): 0.001
Avg execution time (ms): 0.005

I hope you get good advice.

kentaroy47 commented 4 years ago

Thanks for trying it out! Your error comes from the interpolation not included in torch2trt. Did you build torch2trt with the latest commit? I see that FCN does not run, but Do deeplab run?

flow-dev commented 4 years ago

Thank you for your reply.

I installed latest commit and deeplab could not run too.

Your error comes from the interpolation not included in torch2trt. Did you build torch2trt with the latest commit? I see that FCN does not run, but Do deeplab run?

The interpolation has environment-dependent problems. There seems to be no clear solution... I am trying various things referring to the issue, but it is a difficult problem.

https://github.com/NVIDIA-AI-IOT/torch2trt/issues/274 https://github.com/NVIDIA-AI-IOT/torch2trt/issues/119

There is no problem in your code, but I would like to know where you installed torch2trt with jetson nano and xavier.

luhang-HPU commented 4 years ago

Thank you for your reply.

I installed latest commit and deeplab could not run too.

Your error comes from the interpolation not included in torch2trt. Did you build torch2trt with the latest commit? I see that FCN does not run, but Do deeplab run?

The interpolation has environment-dependent problems. There seems to be no clear solution... I am trying various things referring to the issue, but it is a difficult problem.

NVIDIA-AI-IOT/torch2trt#274 NVIDIA-AI-IOT/torch2trt#119

There is no problem in your code, but I would like to know where you installed torch2trt with jetson nano and xavier.

Facing the same problem, with the latest trt7 and torch2trt with plugin installation. Any idea how to solve this? Thanks for this wonderful project!

kentaroy47 commented 4 years ago

I only tried segmentation with Xavier. I used Xavier, Jetpack=4.3, TRT=7, latest torch2trt with plugin upon testing the segmentations.

I just tried running segmentations with Jetson Nano as well, but I was stuck in running the native PyTorch segmentation model. Will report if I get this working.. (Jetpack=4.1, TRT5)

The steps I followed to setup Xavier is as bellow:

1) Install torchvision I followed this instruction and installed torchvision==0.3.0 https://medium.com/hackers-terminal/installing-pytorch-torchvision-on-nvidias-jetson-tx2-81591d03ce32

sudo apt-get install libjpeg-dev zlib1g-dev
git clone -b v0.3.0 https://github.com/pytorch/vision torchvision
cd torchvision
sudo python3 setup.py install

2) Install torch2trt Followed readme. https://github.com/NVIDIA-AI-IOT/torch2trt

sudo apt-get install libprotobuf* protobuf-compiler ninja-build
git clone https://github.com/NVIDIA-AI-IOT/torch2trt
cd torch2trt
sudo python3 setup.py install --plugins 
kentaroy47 commented 4 years ago

Actually, by following this setup, I was able to convert torch2trt with Jetson nano as well. Can you try building torchvision as above? I think that was the issue.

flow-dev commented 4 years ago

Thank you for your reply.

The facts that you can do with Jetson nano and Xavier are very valuable information!!!

I'm now trying on 2080ti and ubuntu 18.04 environment, but what I really need is jetson torch2trt works. I will check it in parallel.

This problem seems to the generation of libtorch2trt.so. I ran the below command, libtorch2trt.so had some undefined symbol.

ldd -r /usr/local/lib/python3.6/dist-packages/torch2trt/libtorch2trt.so

linux-vdso.so.1 (0x00007ffcbe5c6000)
    libc10.so => not found
    libc10_cuda.so => not found
    libtorch.so => not found
    libcudart.so.10.0 => /usr/local/cuda-10.0/targets/x86_64-linux/lib/libcudart.so.10.0 (0x00007f0e800d2000)
    libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f0e7feb3000)
    libnvinfer.so.7 => /usr/lib/x86_64-linux-gnu/libnvinfer.so.7 (0x00007f0e72238000)
    libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f0e71eaf000)
    libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f0e71c97000)
    libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f0e718a6000)
    /lib64/ld-linux-x86-64.so.2 (0x00007f0e806e1000)
    libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f0e716a2000)
    librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f0e7149a000)
    libcudnn.so.7 => /usr/lib/x86_64-linux-gnu/libcudnn.so.7 (0x00007f0e59bcc000)
    libcublas.so.10.0 => /usr/local/cuda-10.0/targets/x86_64-linux/lib/libcublas.so.10.0 (0x00007f0e54789000)
    libmyelin.so.1 => /usr/lib/x86_64-linux-gnu/libmyelin.so.1 (0x00007f0e53f78000)
    libnvrtc.so.10.0 => /usr/local/cuda-10.0/targets/x86_64-linux/lib/libnvrtc.so.10.0 (0x00007f0e5295c000)
    libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f0e525be000)
undefined symbol: _ZN3c1019UndefinedTensorImpl10_singletonE (/usr/local/lib/python3.6/dist-packages/torch2trt/libtorch2trt.so)
....

The problem is that the following libc10.so libc10_cuda.so libtorch.so related to torch cannot be linked.

This problem depends on torch version and g++ version

The following issues are likely to be helpful. I haven't solved it yet...

https://github.com/NVIDIA-AI-IOT/torch2trt/issues/53

flow-dev commented 4 years ago

I read an article about installing torch that you told me. You may be installed torch1.1.0 I'm using torch1.4.0, so the difference seems to be important.

I think to use torch1.1.0 until torch2trt supports torch1.4.0

https://medium.com/hackers-terminal/installing-pytorch-torchvision-on-nvidias-jetson-tx2-81591d03ce32

kentaroy47 commented 4 years ago

Yes, I use torch 1.1.0 and torchvision 0.3.0 for Jetson Nano. I think torch1.4.0 is not fully supported for torch2trt yet.

For Xavier, I used 1.3.0 with Nvidia built binaries. https://forums.developer.nvidia.com/t/pytorch-for-jetson-nano-version-1-4-0-now-available/72048

wget https://nvidia.box.com/shared/static/phqe92v26cbhqjohwtvxorrwnmrnfx1o.whl -O torch-1.3.0-cp36-cp36m-linux_aarch64.whl
pip3 install numpy torch-1.3.0-cp36-cp36m-linux_aarch64.whl
flow-dev commented 4 years ago

Xavier worked with torch 1.3.0. This is great information. Let's recreate my environment.

Thanks for any useful information. Thanks for your contribution!

kentaroy47 commented 4 years ago

@flow-dev Please tell us if your jetson nano/xavier works with the fixed torch version!

It may be informative to create an issue asking an appropriate version of torch that works for torch2trt, which will help others.

flow-dev commented 4 years ago

@kentaroy47 That's a good suggestion. I will write this issue if I can confirm it.

luhang-HPU commented 4 years ago

Reading all your feedbacks. A little catch up: I am using pytorch 1.4.0 as well with titan V on AMD64, not on ARM platforms. I think that may be the cause of this problem.

kentaroy47 commented 4 years ago

Thanks for the comments. @flow-dev @hive-cas , did the model run for you guys by changing the torch version?

flow-dev commented 4 years ago

Thanks for the comments. @flow-dev @hive-cas , did the model run for you guys by changing the torch version?

Not working in the following environments. in my case. Likely to have other dependencies on AMD64.

ubuntu18.04 JEtPack4.3 2080Ti pytorch 1.4.0 -> cannot build pytorch 1.3.0 -> cannot build pytorch 1.1.0 -> cannot build

I am trying to build Jetson Nano now. It will take a little longer...

kentaroy47 commented 4 years ago

@flow-dev Thanks for the updates. You can simply pip install the Nvidia build torch? (there is pytorch1.0-1.4 in the link) https://forums.developer.nvidia.com/t/pytorch-for-jetson-nano-version-1-4-0-now-available/72048

flow-dev commented 4 years ago

@flow-dev Thanks for the updates. You can simply pip install the Nvidia build torch? (there is pytorch1.0-1.4 in the link) https://forums.developer.nvidia.com/t/pytorch-for-jetson-nano-version-1-4-0-now-available/72048

Yes. I can simply pip installed.

luhang-HPU commented 4 years ago

@flow-dev @kentaroy47 I also change the pytorch version from 1.4 to 1.2 and 1.1, and none of them could work. I use Titan V in ubuntu 18.04LTS with cuda 10.2 and tensorRT 7, and the latest commit of torch2trt. My final purpose is not to use it on nano, but just on my amd64 server.

kentaroy47 commented 4 years ago

@hive-cas hmm.. Have you discussed this with the guys in the torch2trt repo? Reporting the error message will greatly help.

flow-dev commented 4 years ago

@kentaroy47 That's a good suggestion. I will write this issue if I can confirm it.

@kentaroy47 I made it the same environment as you. I was able to run with JetsonNano. Thank you very much! (But you have to install in exactly the same way. Details wait for official response.)

kentaroy47 commented 4 years ago

It is really weird that Linking fails in amd64 Ubuntu 18.04 servers, since they should behave as same as Xavier hardware.. Can you post the link to the torch2trt issue so that others can help themselves if they get the same error? Thanks!