pytorch / TensorRT

PyTorch/TorchScript/FX compiler for NVIDIA GPUs using TensorRT
https://pytorch.org/TensorRT
BSD 3-Clause "New" or "Revised" License
2.5k stars 344 forks source link

❓ [Question] Is it possibile to use a model optimized through TorchTensorRT in LibTorch under Windows? #856

Closed andreabonvini closed 2 years ago

andreabonvini commented 2 years ago

❓ Question

I would need to optimize an already trained segmentation model through TorchTensorRT, the idea would be to optimize the model by running the newest PyTorch NGC docker image under WSL2, exporting the model and then loading it in a C++ application that uses LibTorch, e.g.

#include <torch/script.h>
// ...
torch::jit::script::Module module;
try {
  // Deserialize the ScriptModule from a file using torch::jit::load().
  module = torch::jit::load(argv[1]);
}

Would this be the right approach?

What you have already tried

At the moment I only tried to optimize the model through TorchTensorRT, and something weird happens. Here I'll show the results for the Python script below that I obtained on two different devices:

As you can see, the optimization process under WSL gives me a lot of GPU errors, while on Ubuntu it seems to work fine. Why does this happen?

My script:

import torch_tensorrt
import yaml
import torch
import os
import time
import numpy as np
import torch.backends.cudnn as cudnn
import argparse
import segmentation_models_pytorch as smp
import pytorch_lightning as pl
cudnn.benchmark = True

def benchmark(model, input_shape=(1, 3, 512, 512), dtype=torch.float, nwarmup=50, nruns=1000):
    input_data = torch.randn(input_shape)
    input_data = input_data.to("cuda")
    if dtype==torch.half:
        input_data = input_data.half()

    print("Warm up ...")
    with torch.no_grad():
        for _ in range(nwarmup):
            features = model(input_data)
    torch.cuda.synchronize()
    print("Start timing ...")
    timings = []
    with torch.no_grad():
        for i in range(1, nruns+1):
            start_time = time.time()
            features = model(input_data)
            torch.cuda.synchronize()
            end_time = time.time()
            timings.append(end_time - start_time)
            if i%100==0:
                print('Iteration %d/%d, ave batch time %.2f ms'%(i, nruns, np.mean(timings)*1000))

    print("Input shape:", input_data.size())
    print("Output features size:", features.size())

    print('Average batch time: %.2f ms'%(np.mean(timings)*1000))

def load_config(config_path: str):
    with open(config_path) as f:
        config = yaml.load(f, Loader=yaml.FullLoader)
    return config

def main():
    # Load target model
    parser = argparse.ArgumentParser()
    parser.add_argument("weights_path")
    parser.add_argument("config_path")
    args = parser.parse_args()
    config = load_config(args.config_path)
    model_dict = config["model"]
    model_dict["activation"] = "softmax2d"
    model = smp.create_model(**model_dict)
    state_dict = torch.load(args.weights_path)["state_dict"]
    model.load_state_dict(state_dict)
    model.to("cuda")
    model.eval()
    # Create dummy data for tracing and benchmarking purposes.
    dtype = torch.float32
    shape = (1, 3, 512, 512)
    input_data = torch.randn(shape).to("cuda")

    # Convert model to script module
    print("Tracing PyTorch model...")
    traced_script_module = torch.jit.trace(model, input_data)
    # torch_script_module = torch.jit.load(model_path).cuda()
    print("Script Module generated.")
    print("\nBenchmarking Script Module...")
    # First benchmark <===================================
    benchmark(traced_script_module, shape, dtype)

    # Convert to TRT Module...
    output_path = args.config_path.split(os.path.sep)[-1] + "_trt_.pt"
    print("Creating TRT module...")
    trt_ts_module = torch_tensorrt.compile(
        traced_script_module,
        inputs = [
            torch_tensorrt.Input( # Specify input object with shape and dtype
                shape=shape,
                dtype=dtype) # Datatype of input tensor. Allowed options torch.(float|half|int8|int32|bool)
        ],
        enabled_precisions = {dtype},
      )
    print("TRT Module created")
    print("\nBenchmarking TRT Module...")
    benchmark(trt_ts_module, shape, dtype)
    torch.jit.save(trt_ts_module, os.path.join("models",output_path)) # save the TRT embedded Torchscript

if __name__ == "__main__":
    main()

Ubuntu desktop

root@ca10ddc496a3:/DockerStuff# python script.py path/to/checkout.tar path/to/config.yaml
No pretrained weights exist for this model. Using random initialization.
Tracing PyTorch model...
/opt/conda/lib/python3.8/site-packages/segmentation_models_pytorch/base/model.py:16: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if h % output_stride != 0 or w % output_stride != 0:
Script Module generated.

Benchmarking Script Module...
Warm up ...
Start timing ...
Iteration 100/1000, ave batch time 7.00 ms
Iteration 200/1000, ave batch time 6.88 ms
Iteration 300/1000, ave batch time 6.76 ms
Iteration 400/1000, ave batch time 6.91 ms
Iteration 500/1000, ave batch time 6.93 ms
Iteration 600/1000, ave batch time 6.98 ms
Iteration 700/1000, ave batch time 6.99 ms
Iteration 800/1000, ave batch time 6.91 ms
Iteration 900/1000, ave batch time 6.89 ms
Iteration 1000/1000, ave batch time 6.87 ms
Input shape: torch.Size([1, 3, 512, 512])
Output features size: torch.Size([1, 3, 512, 512])
Average batch time: 6.87 ms
Creating TRT module...
WARNING: [Torch-TensorRT] - Mean converter disregards dtype
WARNING: [Torch-TensorRT] - Mean converter disregards dtype
WARNING: [Torch-TensorRT] - Mean converter disregards dtype
WARNING: [Torch-TensorRT] - Mean converter disregards dtype
WARNING: [Torch-TensorRT] - Mean converter disregards dtype
WARNING: [Torch-TensorRT] - Mean converter disregards dtype
WARNING: [Torch-TensorRT] - Mean converter disregards dtype
WARNING: [Torch-TensorRT] - Mean converter disregards dtype
WARNING: [Torch-TensorRT] - Mean converter disregards dtype
WARNING: [Torch-TensorRT] - Mean converter disregards dtype
WARNING: [Torch-TensorRT] - There may be undefined behavior using dynamic shape and aten::size
WARNING: [Torch-TensorRT] - There may be undefined behavior using dynamic shape and aten::size
WARNING: [Torch-TensorRT] - Interpolation layer will be run through ATen, not TensorRT. Performance may be lower than expected
[1, 256, 128, 128]
[1, 256, 128, 128]
WARNING: [Torch-TensorRT] - Interpolation layer will be run through ATen, not TensorRT. Performance may be lower than expected
[1, 3, 512, 512]
[1, 3, 512, 512]
[1, 256, 128, 128]
[1, 3, 512, 512]
[1, 256, 128, 128]
[1, 3, 512, 512]
[1, 256, 128, 128]
[1, 3, 512, 512]
[1, 256, 128, 128]
[1, 3, 512, 512]
TRT Module created

Benchmarking TRT Module...
Warm up ...
Start timing ...
Iteration 100/1000, ave batch time 3.29 ms
Iteration 200/1000, ave batch time 3.30 ms
Iteration 300/1000, ave batch time 3.30 ms
Iteration 400/1000, ave batch time 3.30 ms
Iteration 500/1000, ave batch time 3.31 ms
Iteration 600/1000, ave batch time 3.30 ms
Iteration 700/1000, ave batch time 3.30 ms
Iteration 800/1000, ave batch time 3.30 ms
Iteration 900/1000, ave batch time 3.30 ms
Iteration 1000/1000, ave batch time 3.30 ms
Input shape: torch.Size([1, 3, 512, 512])
Output features size: torch.Size([1, 3, 512, 512])
Average batch time: 3.30 ms

Windows PC

root@3130ab7d9ff8:/DockerStuff# python script.py path/to/checkout.tar path/to/config.yaml
No pretrained weights exist for this model. Using random initialization.
Tracing PyTorch model...
/opt/conda/lib/python3.8/site-packages/segmentation_models_pytorch/base/model.py:16: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if h % output_stride != 0 or w % output_stride != 0:
Script Module generated.

Benchmarking Script Module...
Warm up ...
Start timing ...
Iteration 100/1000, ave batch time 3.21 ms
Iteration 200/1000, ave batch time 3.18 ms
Iteration 300/1000, ave batch time 3.17 ms
Iteration 400/1000, ave batch time 3.17 ms
Iteration 500/1000, ave batch time 3.16 ms
Iteration 600/1000, ave batch time 3.16 ms
Iteration 700/1000, ave batch time 3.16 ms
Iteration 800/1000, ave batch time 3.16 ms
Iteration 900/1000, ave batch time 3.16 ms
Iteration 1000/1000, ave batch time 3.15 ms
Input shape: torch.Size([1, 3, 512, 512])
Output features size: torch.Size([1, 3, 512, 512])
Average batch time: 3.15 ms
Creating TRT module...
WARNING: [Torch-TensorRT] - Mean converter disregards dtype
WARNING: [Torch-TensorRT] - Mean converter disregards dtype
WARNING: [Torch-TensorRT] - Mean converter disregards dtype
WARNING: [Torch-TensorRT] - Mean converter disregards dtype
WARNING: [Torch-TensorRT] - Mean converter disregards dtype
WARNING: [Torch-TensorRT] - Mean converter disregards dtype
WARNING: [Torch-TensorRT] - Mean converter disregards dtype
WARNING: [Torch-TensorRT] - Mean converter disregards dtype
WARNING: [Torch-TensorRT] - Mean converter disregards dtype
WARNING: [Torch-TensorRT] - Mean converter disregards dtype
WARNING: [Torch-TensorRT] - There may be undefined behavior using dynamic shape and aten::size
WARNING: [Torch-TensorRT] - There may be undefined behavior using dynamic shape and aten::size
WARNING: [Torch-TensorRT] - Interpolation layer will be run through ATen, not TensorRT. Performance may be lower than expected
[1, 256, 128, 128]
[1, 256, 128, 128]
WARNING: [Torch-TensorRT] - Interpolation layer will be run through ATen, not TensorRT. Performance may be lower than expected
[1, 3, 512, 512]
[1, 3, 512, 512]
[1, 256, 128, 128]
[1, 3, 512, 512]
[1, 256, 128, 128]
[1, 3, 512, 512]
WARNING: [Torch-TensorRT TorchScript Conversion Context] - GPU error during getBestTactic: %input.17 : Tensor = aten::_convolution(%1217, %self.encoder.model.blocks.1.0.conv_pw.weight, %23, %869, %871, %869, %870, %871, %25, %873, %870, %873, %873), scope: __module.encoder/__module.encoder.model/__module.encoder.model.blocks.1/__module.encoder.model.blocks.1.0/__module.encoder.model.blocks.1.0.conv_pw # /opt/conda/lib/python3.8/site-packages/torch/nn/modules/conv.py:442:0 + %input.19 : Tensor = aten::batch_norm(%input.17, %self.encoder.model.blocks.1.0.bn1.weight, %self.encoder.model.blocks.1.0.bn1.bias, %self.encoder.model.blocks.1.0.bn1.running_mean, %self.encoder.model.blocks.1.0.bn1.running_var, %870, %878, %879, %873), scope: __module.encoder/__module.encoder.model/__module.encoder.model.blocks.1/__module.encoder.model.blocks.1.0/__module.encoder.model.blocks.1.0.bn1 # /opt/conda/lib/python3.8/site-packages/torch/nn/functional.py:2381:0 + %1220 : Tensor = aten::relu(%input.19), scope: __module.encoder/__module.encoder.model/__module.encoder.model.blocks.1/__module.encoder.model.blocks.1.0/__module.encoder.model.blocks.1.0.act1 # /opt/conda/lib/python3.8/site-packages/torch/nn/functional.py:1393:0 : invalid argument
WARNING: [Torch-TensorRT TorchScript Conversion Context] - GPU error during getBestTactic: %input.29 : Tensor = aten::_convolution(%1223, %self.encoder.model.blocks.1.0.conv_pwl.weight, %23, %869, %871, %869, %870, %871, %25, %873, %870, %873, %873), scope: __module.encoder/__module.encoder.model/__module.encoder.model.blocks.1/__module.encoder.model.blocks.1.0/__module.encoder.model.blocks.1.0.conv_pwl # /opt/conda/lib/python3.8/site-packages/torch/nn/modules/conv.py:442:0 + %input.31 : Tensor = aten::batch_norm(%input.29, %self.encoder.model.blocks.1.0.bn3.weight, %self.encoder.model.blocks.1.0.bn3.bias, %self.encoder.model.blocks.1.0.bn3.running_mean, %self.encoder.model.blocks.1.0.bn3.running_var, %870, %878, %879, %873), scope: __module.encoder/__module.encoder.model/__module.encoder.model.blocks.1/__module.encoder.model.blocks.1.0/__module.encoder.model.blocks.1.0.bn3 # /opt/conda/lib/python3.8/site-packages/torch/nn/functional.py:2381:0 : invalid argument
WARNING: [Torch-TensorRT TorchScript Conversion Context] - GPU error during getBestTactic: %input.33 : Tensor = aten::_convolution(%input.31, %self.encoder.model.blocks.2.0.conv_pw.weight, %23, %869, %871, %869, %870, %871, %25, %873, %870, %873, %873), scope: __module.encoder/__module.encoder.model/__module.encoder.model.blocks.2/__module.encoder.model.blocks.2.0/__module.encoder.model.blocks.2.0.conv_pw # /opt/conda/lib/python3.8/site-packages/torch/nn/modules/conv.py:442:0 + %input.35 : Tensor = aten::batch_norm(%input.33, %self.encoder.model.blocks.2.0.bn1.weight, %self.encoder.model.blocks.2.0.bn1.bias, %self.encoder.model.blocks.2.0.bn1.running_mean, %self.encoder.model.blocks.2.0.bn1.running_var, %870, %878, %879, %873), scope: __module.encoder/__module.encoder.model/__module.encoder.model.blocks.2/__module.encoder.model.blocks.2.0/__module.encoder.model.blocks.2.0.bn1 # /opt/conda/lib/python3.8/site-packages/torch/nn/functional.py:2381:0 + %1228 : Tensor = aten::relu(%input.35), scope: __module.encoder/__module.encoder.model/__module.encoder.model.blocks.2/__module.encoder.model.blocks.2.0/__module.encoder.model.blocks.2.0.act1 # /opt/conda/lib/python3.8/site-packages/torch/nn/functional.py:1393:0 || %input.369 : Tensor = aten::_convolution(%input.31, %self.decoder.block1.0.weight, %23, %869, %871, %869, %870, %871, %25, %873, %870, %873, %873), scope: __module.decoder/__module.decoder.block1/__module.decoder.block1.0 # /opt/conda/lib/python3.8/site-packages/torch/nn/modules/conv.py:442:0 + %input.371 : Tensor = aten::batch_norm(%input.369, %self.decoder.block1.1.weight, %self.decoder.block1.1.bias, %self.decoder.block1.1.running_mean, %self.decoder.block1.1.running_var, %870, %878, %879, %873), scope: __module.decoder/__module.decoder.block1/__module.decoder.block1.1 # /opt/conda/lib/python3.8/site-packages/torch/nn/functional.py:2381:0 + %high_res_features : Tensor = aten::relu(%input.371), scope: __module.decoder/__module.decoder.block1/__module.decoder.block1.2 # /opt/conda/lib/python3.8/site-packages/torch/nn/functional.py:1395:0 : invalid argument
WARNING: [Torch-TensorRT TorchScript Conversion Context] - GPU error during getBestTactic: %input.45 : Tensor = aten::_convolution(%1231, %self.encoder.model.blocks.2.0.conv_pwl.weight, %23, %869, %871, %869, %870, %871, %25, %873, %870, %873, %873), scope: __module.encoder/__module.encoder.model/__module.encoder.model.blocks.2/__module.encoder.model.blocks.2.0/__module.encoder.model.blocks.2.0.conv_pwl # /opt/conda/lib/python3.8/site-packages/torch/nn/modules/conv.py:442:0 + %input.47 : Tensor = aten::batch_norm(%input.45, %self.encoder.model.blocks.2.0.bn3.weight, %self.encoder.model.blocks.2.0.bn3.bias, %self.encoder.model.blocks.2.0.bn3.running_mean, %self.encoder.model.blocks.2.0.bn3.running_var, %870, %878, %879, %873), scope: __module.encoder/__module.encoder.model/__module.encoder.model.blocks.2/__module.encoder.model.blocks.2.0/__module.encoder.model.blocks.2.0.bn3 # /opt/conda/lib/python3.8/site-packages/torch/nn/functional.py:2381:0 : invalid argument
WARNING: [Torch-TensorRT TorchScript Conversion Context] - GPU error during getBestTactic: %input.49 : Tensor = aten::_convolution(%input.47, %self.encoder.model.blocks.2.1.conv_pw.weight, %23, %869, %871, %869, %870, %871, %25, %873, %870, %873, %873), scope: __module.encoder/__module.encoder.model/__module.encoder.model.blocks.2/__module.encoder.model.blocks.2.1/__module.encoder.model.blocks.2.1.conv_pw # /opt/conda/lib/python3.8/site-packages/torch/nn/modules/conv.py:442:0 + %input.51 : Tensor = aten::batch_norm(%input.49, %self.encoder.model.blocks.2.1.bn1.weight, %self.encoder.model.blocks.2.1.bn1.bias, %self.encoder.model.blocks.2.1.bn1.running_mean, %self.encoder.model.blocks.2.1.bn1.running_var, %870, %878, %879, %873), scope: __module.encoder/__module.encoder.model/__module.encoder.model.blocks.2/__module.encoder.model.blocks.2.1/__module.encoder.model.blocks.2.1.bn1 # /opt/conda/lib/python3.8/site-packages/torch/nn/functional.py:2381:0 + %1236 : Tensor = aten::relu(%input.51), scope: __module.encoder/__module.encoder.model/__module.encoder.model.blocks.2/__module.encoder.model.blocks.2.1/__module.encoder.model.blocks.2.1.act1 # /opt/conda/lib/python3.8/site-packages/torch/nn/functional.py:1393:0 : invalid argument
WARNING: [Torch-TensorRT TorchScript Conversion Context] - GPU error during getBestTactic: %input.65 : Tensor = aten::_convolution(%1242, %self.encoder.model.blocks.3.0.conv_pw.weight, %23, %869, %871, %869, %870, %871, %25, %873, %870, %873, %873), scope: __module.encoder/__module.encoder.model/__module.encoder.model.blocks.3/__module.encoder.model.blocks.3.0/__module.encoder.model.blocks.3.0.conv_pw # /opt/conda/lib/python3.8/site-packages/torch/nn/modules/conv.py:442:0 + %input.67 : Tensor = aten::batch_norm(%input.65, %self.encoder.model.blocks.3.0.bn1.weight, %self.encoder.model.blocks.3.0.bn1.bias, %self.encoder.model.blocks.3.0.bn1.running_mean, %self.encoder.model.blocks.3.0.bn1.running_var, %870, %878, %879, %873), scope: __module.encoder/__module.encoder.model/__module.encoder.model.blocks.3/__module.encoder.model.blocks.3.0/__module.encoder.model.blocks.3.0.bn1 # /opt/conda/lib/python3.8/site-packages/torch/nn/functional.py:2381:0 + %1245 : Tensor = aten::relu(%input.67), scope: __module.encoder/__module.encoder.model/__module.encoder.model.blocks.3/__module.encoder.model.blocks.3.0/__module.encoder.model.blocks.3.0.act1 # /opt/conda/lib/python3.8/site-packages/torch/nn/functional.py:1393:0 : invalid argument
WARNING: [Torch-TensorRT TorchScript Conversion Context] - GPU error during getBestTactic: %input.85 : Tensor = aten::_convolution(%input.83, %self.encoder.model.blocks.3.0.conv_pwl.weight, %23, %869, %871, %869, %870, %871, %25, %873, %870, %873, %873), scope: __module.encoder/__module.encoder.model/__module.encoder.model.blocks.3/__module.encoder.model.blocks.3.0/__module.encoder.model.blocks.3.0.conv_pwl # /opt/conda/lib/python3.8/site-packages/torch/nn/modules/conv.py:442:0 + %input.87 : Tensor = aten::batch_norm(%input.85, %self.encoder.model.blocks.3.0.bn3.weight, %self.encoder.model.blocks.3.0.bn3.bias, %self.encoder.model.blocks.3.0.bn3.running_mean, %self.encoder.model.blocks.3.0.bn3.running_var, %870, %878, %879, %873), scope: __module.encoder/__module.encoder.model/__module.encoder.model.blocks.3/__module.encoder.model.blocks.3.0/__module.encoder.model.blocks.3.0.bn3 # /opt/conda/lib/python3.8/site-packages/torch/nn/functional.py:2381:0 : invalid argument
WARNING: [Torch-TensorRT TorchScript Conversion Context] - GPU error during getBestTactic: %input.89 : Tensor = aten::_convolution(%input.87, %self.encoder.model.blocks.3.1.conv_pw.weight, %23, %869, %871, %869, %870, %871, %25, %873, %870, %873, %873), scope: __module.encoder/__module.encoder.model/__module.encoder.model.blocks.3/__module.encoder.model.blocks.3.1/__module.encoder.model.blocks.3.1.conv_pw # /opt/conda/lib/python3.8/site-packages/torch/nn/modules/conv.py:442:0 + %input.91 : Tensor = aten::batch_norm(%input.89, %self.encoder.model.blocks.3.1.bn1.weight, %self.encoder.model.blocks.3.1.bn1.bias, %self.encoder.model.blocks.3.1.bn1.running_mean, %self.encoder.model.blocks.3.1.bn1.running_var, %870, %878, %879, %873), scope: __module.encoder/__module.encoder.model/__module.encoder.model.blocks.3/__module.encoder.model.blocks.3.1/__module.encoder.model.blocks.3.1.bn1 # /opt/conda/lib/python3.8/site-packages/torch/nn/functional.py:2381:0 + %1259 : Tensor = aten::relu(%input.91), scope: __module.encoder/__module.encoder.model/__module.encoder.model.blocks.3/__module.encoder.model.blocks.3.1/__module.encoder.model.blocks.3.1.act1 # /opt/conda/lib/python3.8/site-packages/torch/nn/functional.py:1393:0 : invalid argument
WARNING: [Torch-TensorRT TorchScript Conversion Context] - GPU error during getBestTactic: %input.113 : Tensor = aten::_convolution(%1271, %self.encoder.model.blocks.3.2.conv_pw.weight, %23, %869, %871, %869, %870, %871, %25, %873, %870, %873, %873), scope: __module.encoder/__module.encoder.model/__module.encoder.model.blocks.3/__module.encoder.model.blocks.3.2/__module.encoder.model.blocks.3.2.conv_pw # /opt/conda/lib/python3.8/site-packages/torch/nn/modules/conv.py:442:0 + %input.115 : Tensor = aten::batch_norm(%input.113, %self.encoder.model.blocks.3.2.bn1.weight, %self.encoder.model.blocks.3.2.bn1.bias, %self.encoder.model.blocks.3.2.bn1.running_mean, %self.encoder.model.blocks.3.2.bn1.running_var, %870, %878, %879, %873), scope: __module.encoder/__module.encoder.model/__module.encoder.model.blocks.3/__module.encoder.model.blocks.3.2/__module.encoder.model.blocks.3.2.bn1 # /opt/conda/lib/python3.8/site-packages/torch/nn/functional.py:2381:0 + %1274 : Tensor = aten::relu(%input.115), scope: __module.encoder/__module.encoder.model/__module.encoder.model.blocks.3/__module.encoder.model.blocks.3.2/__module.encoder.model.blocks.3.2.act1 # /opt/conda/lib/python3.8/site-packages/torch/nn/functional.py:1393:0 : invalid argument
WARNING: [Torch-TensorRT TorchScript Conversion Context] - GPU error during getBestTactic: %input.137 : Tensor = aten::_convolution(%1286, %self.encoder.model.blocks.3.3.conv_pw.weight, %23, %869, %871, %869, %870, %871, %25, %873, %870, %873, %873), scope: __module.encoder/__module.encoder.model/__module.encoder.model.blocks.3/__module.encoder.model.blocks.3.3/__module.encoder.model.blocks.3.3.conv_pw # /opt/conda/lib/python3.8/site-packages/torch/nn/modules/conv.py:442:0 + %input.139 : Tensor = aten::batch_norm(%input.137, %self.encoder.model.blocks.3.3.bn1.weight, %self.encoder.model.blocks.3.3.bn1.bias, %self.encoder.model.blocks.3.3.bn1.running_mean, %self.encoder.model.blocks.3.3.bn1.running_var, %870, %878, %879, %873), scope: __module.encoder/__module.encoder.model/__module.encoder.model.blocks.3/__module.encoder.model.blocks.3.3/__module.encoder.model.blocks.3.3.bn1 # /opt/conda/lib/python3.8/site-packages/torch/nn/functional.py:2381:0 + %1289 : Tensor = aten::relu(%input.139), scope: __module.encoder/__module.encoder.model/__module.encoder.model.blocks.3/__module.encoder.model.blocks.3.3/__module.encoder.model.blocks.3.3.act1 # /opt/conda/lib/python3.8/site-packages/torch/nn/functional.py:1393:0 : invalid argument
WARNING: [Torch-TensorRT TorchScript Conversion Context] - GPU error during getBestTactic: %input.161 : Tensor = aten::_convolution(%1301, %self.encoder.model.blocks.4.0.conv_pw.weight, %23, %869, %871, %869, %870, %871, %25, %873, %870, %873, %873), scope: __module.encoder/__module.encoder.model/__module.encoder.model.blocks.4/__module.encoder.model.blocks.4.0/__module.encoder.model.blocks.4.0.conv_pw # /opt/conda/lib/python3.8/site-packages/torch/nn/modules/conv.py:442:0 + %input.163 : Tensor = aten::batch_norm(%input.161, %self.encoder.model.blocks.4.0.bn1.weight, %self.encoder.model.blocks.4.0.bn1.bias, %self.encoder.model.blocks.4.0.bn1.running_mean, %self.encoder.model.blocks.4.0.bn1.running_var, %870, %878, %879, %873), scope: __module.encoder/__module.encoder.model/__module.encoder.model.blocks.4/__module.encoder.model.blocks.4.0/__module.encoder.model.blocks.4.0.bn1 # /opt/conda/lib/python3.8/site-packages/torch/nn/functional.py:2381:0 + %1304 : Tensor = aten::relu(%input.163), scope: __module.encoder/__module.encoder.model/__module.encoder.model.blocks.4/__module.encoder.model.blocks.4.0/__module.encoder.model.blocks.4.0.act1 # /opt/conda/lib/python3.8/site-packages/torch/nn/functional.py:1393:0 : invalid argument
WARNING: [Torch-TensorRT TorchScript Conversion Context] - GPU error during getBestTactic: %input.185 : Tensor = aten::_convolution(%1316, %self.encoder.model.blocks.4.1.conv_pw.weight, %23, %869, %871, %869, %870, %871, %25, %873, %870, %873, %873), scope: __module.encoder/__module.encoder.model/__module.encoder.model.blocks.4/__module.encoder.model.blocks.4.1/__module.encoder.model.blocks.4.1.conv_pw # /opt/conda/lib/python3.8/site-packages/torch/nn/modules/conv.py:442:0 + %input.187 : Tensor = aten::batch_norm(%input.185, %self.encoder.model.blocks.4.1.bn1.weight, %self.encoder.model.blocks.4.1.bn1.bias, %self.encoder.model.blocks.4.1.bn1.running_mean, %self.encoder.model.blocks.4.1.bn1.running_var, %870, %878, %879, %873), scope: __module.encoder/__module.encoder.model/__module.encoder.model.blocks.4/__module.encoder.model.blocks.4.1/__module.encoder.model.blocks.4.1.bn1 # /opt/conda/lib/python3.8/site-packages/torch/nn/functional.py:2381:0 + %1319 : Tensor = aten::relu(%input.187), scope: __module.encoder/__module.encoder.model/__module.encoder.model.blocks.4/__module.encoder.model.blocks.4.1/__module.encoder.model.blocks.4.1.act1 # /opt/conda/lib/python3.8/site-packages/torch/nn/functional.py:1393:0 : invalid argument
WARNING: [Torch-TensorRT TorchScript Conversion Context] - GPU error during getBestTactic: %input.209 : Tensor = aten::_convolution(%1331, %self.encoder.model.blocks.4.2.conv_pw.weight, %23, %869, %871, %869, %870, %871, %25, %873, %870, %873, %873), scope: __module.encoder/__module.encoder.model/__module.encoder.model.blocks.4/__module.encoder.model.blocks.4.2/__module.encoder.model.blocks.4.2.conv_pw # /opt/conda/lib/python3.8/site-packages/torch/nn/modules/conv.py:442:0 + %input.211 : Tensor = aten::batch_norm(%input.209, %self.encoder.model.blocks.4.2.bn1.weight, %self.encoder.model.blocks.4.2.bn1.bias, %self.encoder.model.blocks.4.2.bn1.running_mean, %self.encoder.model.blocks.4.2.bn1.running_var, %870, %878, %879, %873), scope: __module.encoder/__module.encoder.model/__module.encoder.model.blocks.4/__module.encoder.model.blocks.4.2/__module.encoder.model.blocks.4.2.bn1 # /opt/conda/lib/python3.8/site-packages/torch/nn/functional.py:2381:0 + %1334 : Tensor = aten::relu(%input.211), scope: __module.encoder/__module.encoder.model/__module.encoder.model.blocks.4/__module.encoder.model.blocks.4.2/__module.encoder.model.blocks.4.2.act1 # /opt/conda/lib/python3.8/site-packages/torch/nn/functional.py:1393:0 : invalid argument
WARNING: [Torch-TensorRT TorchScript Conversion Context] - GPU error during getBestTactic: %input.233 : Tensor = aten::_convolution(%1346, %self.encoder.model.blocks.5.0.conv_pw.weight, %23, %869, %871, %869, %870, %871, %25, %873, %870, %873, %873), scope: __module.encoder/__module.encoder.model/__module.encoder.model.blocks.5/__module.encoder.model.blocks.5.0/__module.encoder.model.blocks.5.0.conv_pw # /opt/conda/lib/python3.8/site-packages/torch/nn/modules/conv.py:442:0 + %input.235 : Tensor = aten::batch_norm(%input.233, %self.encoder.model.blocks.5.0.bn1.weight, %self.encoder.model.blocks.5.0.bn1.bias, %self.encoder.model.blocks.5.0.bn1.running_mean, %self.encoder.model.blocks.5.0.bn1.running_var, %870, %878, %879, %873), scope: __module.encoder/__module.encoder.model/__module.encoder.model.blocks.5/__module.encoder.model.blocks.5.0/__module.encoder.model.blocks.5.0.bn1 # /opt/conda/lib/python3.8/site-packages/torch/nn/functional.py:2381:0 + %1349 : Tensor = aten::relu(%input.235), scope: __module.encoder/__module.encoder.model/__module.encoder.model.blocks.5/__module.encoder.model.blocks.5.0/__module.encoder.model.blocks.5.0.act1 # /opt/conda/lib/python3.8/site-packages/torch/nn/functional.py:1393:0 : invalid argument
WARNING: [Torch-TensorRT TorchScript Conversion Context] - GPU error during getBestTactic: %input.253 : Tensor = aten::_convolution(%input.251, %self.encoder.model.blocks.5.0.conv_pwl.weight, %23, %869, %871, %869, %870, %871, %25, %873, %870, %873, %873), scope: __module.encoder/__module.encoder.model/__module.encoder.model.blocks.5/__module.encoder.model.blocks.5.0/__module.encoder.model.blocks.5.0.conv_pwl # /opt/conda/lib/python3.8/site-packages/torch/nn/modules/conv.py:442:0 + %input.255 : Tensor = aten::batch_norm(%input.253, %self.encoder.model.blocks.5.0.bn3.weight, %self.encoder.model.blocks.5.0.bn3.bias, %self.encoder.model.blocks.5.0.bn3.running_mean, %self.encoder.model.blocks.5.0.bn3.running_var, %870, %878, %879, %873), scope: __module.encoder/__module.encoder.model/__module.encoder.model.blocks.5/__module.encoder.model.blocks.5.0/__module.encoder.model.blocks.5.0.bn3 # /opt/conda/lib/python3.8/site-packages/torch/nn/functional.py:2381:0 : invalid argument
WARNING: [Torch-TensorRT TorchScript Conversion Context] - GPU error during getBestTactic: %input.257 : Tensor = aten::_convolution(%input.255, %self.encoder.model.blocks.5.1.conv_pw.weight, %23, %869, %871, %869, %870, %871, %25, %873, %870, %873, %873), scope: __module.encoder/__module.encoder.model/__module.encoder.model.blocks.5/__module.encoder.model.blocks.5.1/__module.encoder.model.blocks.5.1.conv_pw # /opt/conda/lib/python3.8/site-packages/torch/nn/modules/conv.py:442:0 + %input.259 : Tensor = aten::batch_norm(%input.257, %self.encoder.model.blocks.5.1.bn1.weight, %self.encoder.model.blocks.5.1.bn1.bias, %self.encoder.model.blocks.5.1.bn1.running_mean, %self.encoder.model.blocks.5.1.bn1.running_var, %870, %878, %879, %873), scope: __module.encoder/__module.encoder.model/__module.encoder.model.blocks.5/__module.encoder.model.blocks.5.1/__module.encoder.model.blocks.5.1.bn1 # /opt/conda/lib/python3.8/site-packages/torch/nn/functional.py:2381:0 + %1363 : Tensor = aten::relu(%input.259), scope: __module.encoder/__module.encoder.model/__module.encoder.model.blocks.5/__module.encoder.model.blocks.5.1/__module.encoder.model.blocks.5.1.act1 # /opt/conda/lib/python3.8/site-packages/torch/nn/functional.py:1393:0 : invalid argument
WARNING: [Torch-TensorRT TorchScript Conversion Context] - GPU error during getBestTactic: %input.281 : Tensor = aten::_convolution(%1375, %self.encoder.model.blocks.5.2.conv_pw.weight, %23, %869, %871, %869, %870, %871, %25, %873, %870, %873, %873), scope: __module.encoder/__module.encoder.model/__module.encoder.model.blocks.5/__module.encoder.model.blocks.5.2/__module.encoder.model.blocks.5.2.conv_pw # /opt/conda/lib/python3.8/site-packages/torch/nn/modules/conv.py:442:0 + %input.283 : Tensor = aten::batch_norm(%input.281, %self.encoder.model.blocks.5.2.bn1.weight, %self.encoder.model.blocks.5.2.bn1.bias, %self.encoder.model.blocks.5.2.bn1.running_mean, %self.encoder.model.blocks.5.2.bn1.running_var, %870, %878, %879, %873), scope: __module.encoder/__module.encoder.model/__module.encoder.model.blocks.5/__module.encoder.model.blocks.5.2/__module.encoder.model.blocks.5.2.bn1 # /opt/conda/lib/python3.8/site-packages/torch/nn/functional.py:2381:0 + %1378 : Tensor = aten::relu(%input.283), scope: __module.encoder/__module.encoder.model/__module.encoder.model.blocks.5/__module.encoder.model.blocks.5.2/__module.encoder.model.blocks.5.2.act1 # /opt/conda/lib/python3.8/site-packages/torch/nn/functional.py:1393:0 : invalid argument
WARNING: [Torch-TensorRT TorchScript Conversion Context] - GPU error during getBestTactic: %input.305 : Tensor = aten::_convolution(%1390, %self.encoder.model.blocks.6.0.conv_pw.weight, %23, %869, %871, %869, %870, %871, %25, %873, %870, %873, %873), scope: __module.encoder/__module.encoder.model/__module.encoder.model.blocks.6/__module.encoder.model.blocks.6.0/__module.encoder.model.blocks.6.0.conv_pw # /opt/conda/lib/python3.8/site-packages/torch/nn/modules/conv.py:442:0 + %input.307 : Tensor = aten::batch_norm(%input.305, %self.encoder.model.blocks.6.0.bn1.weight, %self.encoder.model.blocks.6.0.bn1.bias, %self.encoder.model.blocks.6.0.bn1.running_mean, %self.encoder.model.blocks.6.0.bn1.running_var, %870, %878, %879, %873), scope: __module.encoder/__module.encoder.model/__module.encoder.model.blocks.6/__module.encoder.model.blocks.6.0/__module.encoder.model.blocks.6.0.bn1 # /opt/conda/lib/python3.8/site-packages/torch/nn/functional.py:2381:0 + %1393 : Tensor = aten::relu(%input.307), scope: __module.encoder/__module.encoder.model/__module.encoder.model.blocks.6/__module.encoder.model.blocks.6.0/__module.encoder.model.blocks.6.0.act1 # /opt/conda/lib/python3.8/site-packages/torch/nn/functional.py:1393:0 : invalid argument
WARNING: [Torch-TensorRT TorchScript Conversion Context] - GPU error during getBestTactic: %input.317 : Tensor = aten::_convolution(%1396, %self.encoder.model.blocks.6.0.conv_pwl.weight, %23, %869, %871, %869, %870, %871, %25, %873, %870, %873, %873), scope: __module.encoder/__module.encoder.model/__module.encoder.model.blocks.6/__module.encoder.model.blocks.6.0/__module.encoder.model.blocks.6.0.conv_pwl # /opt/conda/lib/python3.8/site-packages/torch/nn/modules/conv.py:442:0 + %input.319 : Tensor = aten::batch_norm(%input.317, %self.encoder.model.blocks.6.0.bn3.weight, %self.encoder.model.blocks.6.0.bn3.bias, %self.encoder.model.blocks.6.0.bn3.running_mean, %self.encoder.model.blocks.6.0.bn3.running_var, %870, %878, %879, %873), scope: __module.encoder/__module.encoder.model/__module.encoder.model.blocks.6/__module.encoder.model.blocks.6.0/__module.encoder.model.blocks.6.0.bn3 # /opt/conda/lib/python3.8/site-packages/torch/nn/functional.py:2381:0 : invalid argument
WARNING: [Torch-TensorRT TorchScript Conversion Context] - GPU error during getBestTactic: %input.321 : Tensor = aten::_convolution(%input.319, %self.decoder.aspp.0.convs.0.0.weight, %23, %869, %871, %869, %870, %871, %25, %873, %870, %873, %873), scope: __module.decoder/__module.decoder.aspp/__module.decoder.aspp.0/__module.decoder.aspp.0.convs.0/__module.decoder.aspp.0.convs.0.0 # /opt/conda/lib/python3.8/site-packages/torch/nn/modules/conv.py:442:0 + %input.323 : Tensor = aten::batch_norm(%input.321, %self.decoder.aspp.0.convs.0.1.weight, %self.decoder.aspp.0.convs.0.1.bias, %self.decoder.aspp.0.convs.0.1.running_mean, %self.decoder.aspp.0.convs.0.1.running_var, %870, %878, %879, %873), scope: __module.decoder/__module.decoder.aspp/__module.decoder.aspp.0/__module.decoder.aspp.0.convs.0/__module.decoder.aspp.0.convs.0.1 # /opt/conda/lib/python3.8/site-packages/torch/nn/functional.py:2381:0 + %1401 : Tensor = aten::relu(%input.323), scope: __module.decoder/__module.decoder.aspp/__module.decoder.aspp.0/__module.decoder.aspp.0.convs.0/__module.decoder.aspp.0.convs.0.2 # /opt/conda/lib/python3.8/site-packages/torch/nn/functional.py:1395:0 : invalid argument
WARNING: [Torch-TensorRT TorchScript Conversion Context] - GPU error during getBestTactic: %input.327 : Tensor = aten::_convolution(%input.325, %self.decoder.aspp.0.convs.1.0.1.weight, %23, %869, %871, %869, %870, %871, %25, %873, %870, %873, %873), scope: __module.decoder/__module.decoder.aspp/__module.decoder.aspp.0/__module.decoder.aspp.0.convs.1/__module.decoder.aspp.0.convs.1.0/__module.decoder.aspp.0.convs.1.0.1 # /opt/conda/lib/python3.8/site-packages/torch/nn/modules/conv.py:442:0 + %input.329 : Tensor = aten::batch_norm(%input.327, %self.decoder.aspp.0.convs.1.1.weight, %self.decoder.aspp.0.convs.1.1.bias, %self.decoder.aspp.0.convs.1.1.running_mean, %self.decoder.aspp.0.convs.1.1.running_var, %870, %878, %879, %873), scope: __module.decoder/__module.decoder.aspp/__module.decoder.aspp.0/__module.decoder.aspp.0.convs.1/__module.decoder.aspp.0.convs.1.1 # /opt/conda/lib/python3.8/site-packages/torch/nn/functional.py:2381:0 + %1405 : Tensor = aten::relu(%input.329), scope: __module.decoder/__module.decoder.aspp/__module.decoder.aspp.0/__module.decoder.aspp.0.convs.1/__module.decoder.aspp.0.convs.1.2 # /opt/conda/lib/python3.8/site-packages/torch/nn/functional.py:1395:0 : invalid argument
WARNING: [Torch-TensorRT TorchScript Conversion Context] - GPU error during getBestTactic: %input.333 : Tensor = aten::_convolution(%input.331, %self.decoder.aspp.0.convs.2.0.1.weight, %23, %869, %871, %869, %870, %871, %25, %873, %870, %873, %873), scope: __module.decoder/__module.decoder.aspp/__module.decoder.aspp.0/__module.decoder.aspp.0.convs.2/__module.decoder.aspp.0.convs.2.0/__module.decoder.aspp.0.convs.2.0.1 # /opt/conda/lib/python3.8/site-packages/torch/nn/modules/conv.py:442:0 + %input.335 : Tensor = aten::batch_norm(%input.333, %self.decoder.aspp.0.convs.2.1.weight, %self.decoder.aspp.0.convs.2.1.bias, %self.decoder.aspp.0.convs.2.1.running_mean, %self.decoder.aspp.0.convs.2.1.running_var, %870, %878, %879, %873), scope: __module.decoder/__module.decoder.aspp/__module.decoder.aspp.0/__module.decoder.aspp.0.convs.2/__module.decoder.aspp.0.convs.2.1 # /opt/conda/lib/python3.8/site-packages/torch/nn/functional.py:2381:0 + %1409 : Tensor = aten::relu(%input.335), scope: __module.decoder/__module.decoder.aspp/__module.decoder.aspp.0/__module.decoder.aspp.0.convs.2/__module.decoder.aspp.0.convs.2.2 # /opt/conda/lib/python3.8/site-packages/torch/nn/functional.py:1395:0 : invalid argument
WARNING: [Torch-TensorRT TorchScript Conversion Context] - GPU error during getBestTactic: %input.339 : Tensor = aten::_convolution(%input.337, %self.decoder.aspp.0.convs.3.0.1.weight, %23, %869, %871, %869, %870, %871, %25, %873, %870, %873, %873), scope: __module.decoder/__module.decoder.aspp/__module.decoder.aspp.0/__module.decoder.aspp.0.convs.3/__module.decoder.aspp.0.convs.3.0/__module.decoder.aspp.0.convs.3.0.1 # /opt/conda/lib/python3.8/site-packages/torch/nn/modules/conv.py:442:0 + %input.341 : Tensor = aten::batch_norm(%input.339, %self.decoder.aspp.0.convs.3.1.weight, %self.decoder.aspp.0.convs.3.1.bias, %self.decoder.aspp.0.convs.3.1.running_mean, %self.decoder.aspp.0.convs.3.1.running_var, %870, %878, %879, %873), scope: __module.decoder/__module.decoder.aspp/__module.decoder.aspp.0/__module.decoder.aspp.0.convs.3/__module.decoder.aspp.0.convs.3.1 # /opt/conda/lib/python3.8/site-packages/torch/nn/functional.py:2381:0 + %1413 : Tensor = aten::relu(%input.341), scope: __module.decoder/__module.decoder.aspp/__module.decoder.aspp.0/__module.decoder.aspp.0.convs.3/__module.decoder.aspp.0.convs.3.2 # /opt/conda/lib/python3.8/site-packages/torch/nn/functional.py:1395:0 : invalid argument
WARNING: [Torch-TensorRT TorchScript Conversion Context] - GPU error during getBestTactic: %input.353 : Tensor = aten::_convolution(%input.351, %self.decoder.aspp.0.project.0.weight, %23, %869, %871, %869, %870, %871, %25, %873, %870, %873, %873), scope: __module.decoder/__module.decoder.aspp/__module.decoder.aspp.0/__module.decoder.aspp.0.project/__module.decoder.aspp.0.project.0 # /opt/conda/lib/python3.8/site-packages/torch/nn/modules/conv.py:442:0 + %input.355 : Tensor = aten::batch_norm(%input.353, %self.decoder.aspp.0.project.1.weight, %self.decoder.aspp.0.project.1.bias, %self.decoder.aspp.0.project.1.running_mean, %self.decoder.aspp.0.project.1.running_var, %870, %878, %879, %873), scope: __module.decoder/__module.decoder.aspp/__module.decoder.aspp.0/__module.decoder.aspp.0.project/__module.decoder.aspp.0.project.1 # /opt/conda/lib/python3.8/site-packages/torch/nn/functional.py:2381:0 + %input.357 : Tensor = aten::relu(%input.355), scope: __module.decoder/__module.decoder.aspp/__module.decoder.aspp.0/__module.decoder.aspp.0.project/__module.decoder.aspp.0.project.2 # /opt/conda/lib/python3.8/site-packages/torch/nn/functional.py:1395:0 : invalid argument
WARNING: [Torch-TensorRT TorchScript Conversion Context] - GPU error during getBestTactic: %input.363 : Tensor = aten::_convolution(%input.361, %self.decoder.aspp.1.1.weight, %23, %869, %871, %869, %870, %871, %25, %873, %870, %873, %873), scope: __module.decoder/__module.decoder.aspp/__module.decoder.aspp.1/__module.decoder.aspp.1.1 # /opt/conda/lib/python3.8/site-packages/torch/nn/modules/conv.py:442:0 + %input.365 : Tensor = aten::batch_norm(%input.363, %self.decoder.aspp.2.weight, %self.decoder.aspp.2.bias, %self.decoder.aspp.2.running_mean, %self.decoder.aspp.2.running_var, %870, %878, %879, %873), scope: __module.decoder/__module.decoder.aspp/__module.decoder.aspp.2 # /opt/conda/lib/python3.8/site-packages/torch/nn/functional.py:2381:0 + %input.367 : Tensor = aten::relu(%input.365), scope: __module.decoder/__module.decoder.aspp/__module.decoder.aspp.3 # /opt/conda/lib/python3.8/site-packages/torch/nn/functional.py:1395:0 : invalid argument
WARNING: [Torch-TensorRT TorchScript Conversion Context] - GPU error during getBestTactic: %input.377 : Tensor = aten::_convolution(%input.375, %self.decoder.block2.0.1.weight, %23, %869, %871, %869, %870, %871, %25, %873, %870, %873, %873), scope: __module.decoder/__module.decoder.block2/__module.decoder.block2.0/__module.decoder.block2.0.1 # /opt/conda/lib/python3.8/site-packages/torch/nn/modules/conv.py:442:0 + %input.379 : Tensor = aten::batch_norm(%input.377, %self.decoder.block2.1.weight, %self.decoder.block2.1.bias, %self.decoder.block2.1.running_mean, %self.decoder.block2.1.running_var, %870, %878, %879, %873), scope: __module.decoder/__module.decoder.block2/__module.decoder.block2.1 # /opt/conda/lib/python3.8/site-packages/torch/nn/functional.py:2381:0 + %input.381 : Tensor = aten::relu(%input.379), scope: __module.decoder/__module.decoder.block2/__module.decoder.block2.2 # /opt/conda/lib/python3.8/site-packages/torch/nn/functional.py:1395:0 : invalid argument
WARNING: [Torch-TensorRT TorchScript Conversion Context] - GPU error during getBestTactic: %input.383 : Tensor = aten::_convolution(%input.381, %self.segmentation_head.0.weight, %self.segmentation_head.0.bias, %869, %871, %869, %870, %871, %25, %873, %870, %873, %873), scope: __module.segmentation_head/__module.segmentation_head.0 # /opt/conda/lib/python3.8/site-packages/torch/nn/modules/conv.py:442:0 : invalid argument
[1, 256, 128, 128]
[1, 3, 512, 512]
[1, 256, 128, 128]
[1, 3, 512, 512]
TRT Module created

Benchmarking TRT Module...
Warm up ...
Start timing ...
Iteration 100/1000, ave batch time 2.74 ms
Iteration 200/1000, ave batch time 2.75 ms
Iteration 300/1000, ave batch time 2.74 ms
Iteration 400/1000, ave batch time 2.75 ms
Iteration 500/1000, ave batch time 2.74 ms
Iteration 600/1000, ave batch time 2.74 ms
Iteration 700/1000, ave batch time 2.75 ms
Iteration 800/1000, ave batch time 2.75 ms
Iteration 900/1000, ave batch time 2.75 ms
Iteration 1000/1000, ave batch time 2.75 ms
Input shape: torch.Size([1, 3, 512, 512])
Output features size: torch.Size([1, 3, 512, 512])

Environment

newest PyTorch NGC docker image

My Windows PC mounts a RTX3080. My Ubuntu desktop mounts a GTX1080Ti.

Additional context

narendasan commented 2 years ago

Did you verify that your GPU is accessible in WSL as well as in a container inside WSL?

narendasan commented 2 years ago

I tried out one of our example notebooks in WSL2 in the 22.01 container. Seems like things work properly. I would make sure that your GPU is accessible from WSL

narendasan commented 2 years ago

Also are you planning to run this model in deployment inside WSL or in Windows? Iirc, there isn't necessarily compatibility across operating systems (WSL would fall under Linux). @ncomly-nvidia do you know? I think however that running in WSL should be fine as long as it fits your usecase

narendasan commented 2 years ago

I tried out one of our example notebooks in WSL2 in the 22.01 container. Seems like things work properly. I would make sure that your GPU is accessible from WSL

This is on Windows 10: 21H2, with CUDA 11.6 installed on the system and following these instructions https://docs.nvidia.com/cuda/wsl-user-guide/index.html

andreabonvini commented 2 years ago

Hi @narendasan, thanks for your answer. I solved the first problem (now I have the same behaviour in both WSL and Ubuntu, which is great!) by downloading and installing the latest driver from here. But now I got another problem: I really NEED to use the optimized model in a Windows environment (and not WSL) wth LibTorch. This is the C++ script I'm using to test if the model is functioning correctly:

#include <iostream>
#include <vector>
#include <ATen/Context.h>
#include <torch/torch.h>
#include <torch/script.h>

#include <chrono>

// =============================== SET PARAMETERS ==================================================
std::string MODEL_PATH = "path/to/trt/model.pt";
int nWarmUp = 50;
int nForwardPass = 1000;

int main() {

    const torch::Device device = torch::Device(torch::kCUDA, 0);
    torch::jit::script::Module model;

    std::cout << "Trying to load the model" << std::endl;
    try {
        model = torch::jit::load(MODEL_PATH, device);
        model.eval();
        std::cout << "AI model loaded successfully." << std::endl;
    }
    catch (const c10::Error& e) {
        std::cerr << e.what() << std::endl;
    }

    std::cout << "Warming up model..." << std::endl;
    auto dummy = torch::zeros({ 1, 3, 512, 512 }).to(device);
    torch::Tensor output;
    std::vector<torch::jit::IValue> inputs;
    inputs.clear();
    inputs.emplace_back(dummy);ù
    std::cout << "Warming up...";

    for (int i = 0; i < nWarmUp; i++) {
        output = model.forward(inputs).toTensor();
        torch::cuda::synchronize();
    }

    using milli = std::chrono::milliseconds;
    std::vector<double> times;

    for (int i = 0; i < nForwardPass; i++) {
        auto start = std::chrono::high_resolution_clock::now();
        output = model.forward(inputs).toTensor();
        torch::cuda::synchronize();
        auto finish = std::chrono::high_resolution_clock::now();
        auto t = std::chrono::duration_cast<milli>(finish - start).count();
        times.push_back(static_cast<double>(t));
    }

    std::cout << "\nProfiling concluded. Printing report...\nf" << std::endl;
    std::cout << "==>  MIN inference time: " << *std::min_element(times.begin(), times.end()) << std::endl;
    std::cout << "==> MEAN inference time: " << std::accumulate(std::begin(times), std::end(times), 0.0) / static_cast<double>(times.size()) << std::endl;

}

If I try to run this C++ script with the optimized model, the program fails on loading.

image

Is there any way to make this work?

Thanks

narendasan commented 2 years ago

You can try turning on debug logging to see if it is torch-trt's runtime failing. Also its worth trying with a non compiled torchscript module beforehand as well

andreabonvini commented 2 years ago

Hi @narendasan, what do you mean with "turning on debug logging"? The error, as shown in the stack trace, happens in an externel .dll (torch_cpu), the source code should be somewhere around [this]( https://github.com/pytorch/pytorch/blob/master/torch/csrc/jit/frontend/script_type_parser.cpp#:~:text=if%20(resolver_)%20%7B-,if%20(auto%20typePtr%20%3D,-resolver_%2D%3EresolveType(expr ) line of code. Moreover, I already tried to run the code with the same traced script module (not optimized with TorchTensorRT) and it works well.

narendasan commented 2 years ago

You can enable torchtrt debug logging with torch_tensorrt.logging.set_reportable_log_level(torch_tensorrt.logging.Level.Debug) before your run things related to torch_tensorrt

narendasan commented 2 years ago

Also how did you build Torch-TensorRT for windows?

andreabonvini commented 2 years ago

Ok thanks, I will include here just the ouptut of the tracing and optimization process through TorchTensorRT. This is the output I have when I run the script without debug logging enabled:

Tracing PyTorch model...
Script Module generated.
Creating TRT module...
[1, 256, 128, 128]
[1, 256, 128, 128]
[1, 3, 512, 512]
[1, 3, 512, 512]
[1, 256, 128, 128]
[1, 3, 512, 512]
[1, 256, 128, 128]
[1, 3, 512, 512]
[1, 256, 128, 128]
[1, 3, 512, 512]
[1, 256, 128, 128]
[1, 3, 512, 512]
WARNING: [Torch-TensorRT] - Mean converter disregards dtype
WARNING: [Torch-TensorRT] - Mean converter disregards dtype
WARNING: [Torch-TensorRT] - Mean converter disregards dtype
WARNING: [Torch-TensorRT] - Mean converter disregards dtype
WARNING: [Torch-TensorRT] - Mean converter disregards dtype
WARNING: [Torch-TensorRT] - Mean converter disregards dtype
WARNING: [Torch-TensorRT] - Mean converter disregards dtype
WARNING: [Torch-TensorRT] - Mean converter disregards dtype
WARNING: [Torch-TensorRT] - Mean converter disregards dtype
WARNING: [Torch-TensorRT] - Mean converter disregards dtype
WARNING: [Torch-TensorRT] - There may be undefined behavior using dynamic shape and aten::size
WARNING: [Torch-TensorRT] - There may be undefined behavior using dynamic shape and aten::size
WARNING: [Torch-TensorRT] - Interpolation layer will be run through ATen, not TensorRT. Performance may be lower than expected
WARNING: [Torch-TensorRT] - Interpolation layer will be run through ATen, not TensorRT. Performance may be lower than expected

This is the output I have when I run the script with debug logging enabled.

I didn't build TorchTensorRT for Windows, I'm using the latest PyTorch Docker container (22.01) on WSL2 by following the official instructions here. But I need to run the model through LibTorch on Windows though.

narendasan commented 2 years ago

To run the model on windows with libtorch, you need at minimum need to compile the libtorchtrt_runtime library which is the runtime extension to run compiled torchtrt programs. We used to have windows support for a little bit but this quickly degraded. Perhaps just working on the runtime library is easier to get working (just building //core/runtime).

andreabonvini commented 2 years ago

Ok thanks @narendasan, following your advice I'm trying to compile the whole project on Windows 10, the idea is to build the Python package and optimize the model locally. Firstly, I was able to succesfully run the command bazel build //:libtorchtrt --compilation_mode opt by modifying a series of files, as I will show below. This is how my WORKSPACE file looks like:

workspace(name = "Torch-TensorRT")

load("@bazel_tools//tools/build_defs/repo:http.bzl", "http_archive")
load("@bazel_tools//tools/build_defs/repo:git.bzl", "git_repository")

http_archive(
    name = "rules_python",
    sha256 = "778197e26c5fbeb07ac2a2c5ae405b30f6cb7ad1f5510ea6fdac03bded96cc6f",
    url = "https://github.com/bazelbuild/rules_python/releases/download/0.2.0/rules_python-0.2.0.tar.gz",
)

load("@rules_python//python:pip.bzl", "pip_install")

http_archive(
    name = "rules_pkg",
    sha256 = "038f1caa773a7e35b3663865ffb003169c6a71dc995e39bf4815792f385d837d",
    urls = [
        "https://mirror.bazel.build/github.com/bazelbuild/rules_pkg/releases/download/0.4.0/rules_pkg-0.4.0.tar.gz",
        "https://github.com/bazelbuild/rules_pkg/releases/download/0.4.0/rules_pkg-0.4.0.tar.gz",
    ],
)

load("@rules_pkg//:deps.bzl", "rules_pkg_dependencies")

rules_pkg_dependencies()

git_repository(
    name = "googletest",
    commit = "703bd9caab50b139428cea1aaff9974ebee5742e",
    remote = "https://github.com/google/googletest",
    shallow_since = "1570114335 -0400",
)

# External dependency for trtorch if you already have precompiled binaries.
# This is currently used in pytorch NGC container CI testing.
#local_repository(
#    name = "trtorch",
#    path = "C:/Python39/Lib/site-packages/trtorch"
#)

# CUDA should be installed on the system locally
new_local_repository(
    name = "cuda",
    build_file = "@//third_party/cuda:BUILD",
    path = "C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v11.3",
)

new_local_repository(
    name = "cublas",
    build_file = "@//third_party/cublas:BUILD",
    path = "C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v11.3",
)

####################################################################################
# Locally installed dependencies (use in cases of custom dependencies or aarch64)
####################################################################################

# NOTE: In the case you are using just the pre-cxx11-abi path or just the cxx11 abi path
# with your local libtorch, just point deps at the same path to satisfy bazel.

# NOTE: NVIDIA's aarch64 PyTorch (python) wheel file uses the CXX11 ABI unlike PyTorch's standard
# x86_64 python distribution. If using NVIDIA's version just point to the root of the package
# for both versions here and do not use --config=pre-cxx11-abi

new_local_repository(
    name = "libtorch",
    path = "C:/src/libtorch1.10.0-cuda11.3-release/libtorch",
    # path = "C:/Users/myUser/appdata/local/packages/pythonsoftwarefoundation.python.3.9_qbz5n2kfra8p0/localcache/local-packages/python39/site-packages/torch",
    build_file = "third_party/libtorch/BUILD"
)

new_local_repository(
    name = "libtorch_pre_cxx11_abi",
    path = "C:/src/libtorch1.10.0-cuda11.3-release/libtorch",
    # path = "C:/Users/myUser/appdata/local/packages/pythonsoftwarefoundation.python.3.9_qbz5n2kfra8p0/localcache/local-packages/python39/site-packages/torch",
    build_file = "third_party/libtorch/BUILD"
)

new_local_repository(
    name = "cudnn",
    path = "C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v11.3",
    build_file = "@//third_party/cudnn/local:BUILD"
)

new_local_repository(
   name = "tensorrt",
   path = "C:/tensorrt",
   build_file = "@//third_party/tensorrt/local:BUILD"
)
core/partitioning/shape_analysis.cpp(130): error C2665: 'torch_tensorrt::core::util::toDims': none of the 2 overloads could convert all the argument types
.\core/util/trt_util.h(141): note: could be 'nvinfer1::Dims torch_tensorrt::core::util::toDims(c10::List<int64_t>)'
.\core/util/trt_util.h(140): note: or       'nvinfer1::Dims torch_tensorrt::core::util::toDims(c10::IntArrayRef)'
core/partitioning/shape_analysis.cpp(130): note: while trying to match the argument list '(c10::List<long>)'
Target //:libtorchtrt failed to build
Use --verbose_failures to see the command lines of failed build steps.
INFO: Elapsed time: 2.588s, Critical Path: 2.03s
INFO: 2 processes: 2 internal.
FAILED: Build did NOT complete successfully

In order to solve it, it was enough to change core/partitioning/shape_analysis.cpp at line 130: from (:-1:) input_shapes.push_back(util::toVec(util::toDims(c10::List<long int>({1})))); to (:+1:) input_shapes.push_back(util::toVec(util::toDims(c10::List<long long>({1}))));

C:\Torch-TensorRT>bazel build //:libtorchtrt --compilation_mode opt
ERROR: C:/torch-tensorrt/cpp/lib/BUILD:34:10: Linking cpp/lib/torch_tensorrt.dll failed: missing input file 'external/cudnn/bin/cudnn64_7.dll', owner: '@cudnn//:bin/cudnn64_7.dll'
ERROR: C:/torch-tensorrt/cpp/lib/BUILD:34:10: Linking cpp/lib/torch_tensorrt.dll failed: 1 input file(s) do not exist
Target //:libtorchtrt failed to build
Use --verbose_failures to see the command lines of failed build steps.
ERROR: C:/torch-tensorrt/cpp/lib/BUILD:34:10 Linking cpp/lib/torch_tensorrt.dll failed: 1 input file(s) do not exist
INFO: Elapsed time: 39.788s, Critical Path: 14.73s
INFO: 81 processes: 2 internal, 79 local.
FAILED: Build did NOT complete successfully

The solution here is to change the file C:\Torch-TensorRT\third_party\cudnn\local\BUILD from (:-1:)

cc_import(
    name = "cudnn_lib",
    shared_library = select({
        ":aarch64_linux": "lib/aarch64-linux-gnu/libcudnn.so",
        ":windows": "bin/cudnn64_7.dll",  #Need to configure specific version for windows
        "//conditions:default": "lib/x86_64-linux-gnu/libcudnn.so",
    }),
    visibility = ["//visibility:private"],
)

to (:+1:)

cc_import(
    name = "cudnn_lib",
    shared_library = select({
        ":aarch64_linux": "lib/aarch64-linux-gnu/libcudnn.so",
        ":windows": "bin/cudnn64_8.dll",  #Need to configure specific version for windows
        "//conditions:default": "lib/x86_64-linux-gnu/libcudnn.so",
    }),
    visibility = ["//visibility:private"],
)
INFO: Analyzed target //:libtorchtrt (1 packages loaded, 42 targets configured).
INFO: Found 1 target...
ERROR: C:/torch-tensorrt/cpp/lib/BUILD:34:10: Linking cpp/lib/torch_tensorrt.dll failed: (Exit 1120): link.exe failed: error executing command C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.29.30133\bin\HostX64\x64\link.exe @bazel-out/x64_windows-opt/bin/cpp/lib/torch_tensorrt.dll-2.params
LINK : warning LNK4044: unrecognized option '/lpthread'; ignored
LINK : warning LNK4044: unrecognized option '/lpthread'; ignored
LINK : warning LNK4044: unrecognized option '/Wl,-rpath,lib/'; ignored
   Creating library bazel-out/x64_windows-opt/bin/cpp/lib/torch_tensorrt.dll.if.lib and object bazel-out/x64_windows-opt/bin/cpp/lib/torch_tensorrt.dll.if.exp
runtime.lo.lib(TRTEngine.obj) : error LNK2019: unresolved external symbol createInferRuntime_INTERNAL referenced in function "public: __cdecl torch_tensorrt::core::runtime::TRTEngine::TRTEngine(class std::basic_string<char,struct std::char_traits<char>,class std::allocator<char> >,class std::basic_string<char,struct std::char_traits<char>,class std::allocator<char> >,struct torch_tensorrt::core::runtime::CudaDevice)" (??0TRTEngine@runtime@core@torch_tensorrt@@QEAA@V?$basic_string@DU?$char_traits@D@std@@V?$allocator@D@2@@std@@0UCudaDevice@123@@Z)
torch_tensorrt_plugins.lo.lib(register_plugins.obj) : error LNK2001: unresolved external symbol getPluginRegistry
torch_tensorrt_plugins.lo.lib(normalize_plugin.obj) : error LNK2001: unresolved external symbol getPluginRegistry
torch_tensorrt_plugins.lo.lib(interpolate_plugin.obj) : error LNK2001: unresolved external symbol getPluginRegistry
converters.lo.lib(pooling.obj) : error LNK2001: unresolved external symbol getPluginRegistry
converters.lo.lib(normalize.obj) : error LNK2001: unresolved external symbol getPluginRegistry
converters.lo.lib(interpolate.obj) : error LNK2001: unresolved external symbol getPluginRegistry
converters.lo.lib(batch_norm.obj) : error LNK2001: unresolved external symbol getPluginRegistry
torch_tensorrt_plugins.lo.lib(register_plugins.obj) : error LNK2019: unresolved external symbol initLibNvInferPlugins referenced in function "public: __cdecl torch_tensorrt::core::plugins::impl::TorchTRTPluginRegistry::TorchTRTPluginRegistry(void)" (??0TorchTRTPluginRegistry@impl@plugins@core@torch_tensorrt@@QEAA@XZ)
conversionctx.lib(ConversionCtx.obj) : error LNK2019: unresolved external symbol createInferBuilder_INTERNAL referenced in function "public: __cdecl torch_tensorrt::core::conversion::ConversionCtx::ConversionCtx(struct torch_tensorrt::core::conversion::BuilderSettings)" (??0ConversionCtx@conversion@core@torch_tensorrt@@QEAA@UBuilderSettings@123@@Z)
bazel-out\x64_windows-opt\bin\cpp\lib\torch_tensorrt.dll : fatal error LNK1120: 4 unresolved externals
Target //:libtorchtrt failed to build
Use --verbose_failures to see the command lines of failed build steps.
INFO: Elapsed time: 0.874s, Critical Path: 0.30s
INFO: 2 processes: 2 internal.
FAILED: Build did NOT complete successfully

That I solved, as suggested by @yuriishutkin in issue #690 (referenced in issue #226) by substituting in third_party/tensorrt/local/BUILD this (:-1:):

cc_library(
    name = "nvinferplugin",
    hdrs = select({
        ":aarch64_linux": glob(["include/aarch64-linux-gnu/NvInferPlugin*.h"]),
        ":windows": glob(["include/NvInferPlugin*.h"]),
        "//conditions:default": glob(["include/x86_64-linux-gnu/NvInferPlugin*.h"]),
    }),
    srcs = select({
        ":aarch64_linux": ["lib/aarch64-linux-gnu/libnvinfer_plugin.so"],
        ":windows": ["lib/nvinfer_plugin.dll"],
        "//conditions:default": ["lib/x86_64-linux-gnu/libnvinfer_plugin.so"],
    }),
    includes = select({
        ":aarch64_linux": ["include/aarch64-linux-gnu/"],
        ":windows": ["include/"],
        "//conditions:default": ["include/x86_64-linux-gnu/"],
    }),
    deps = [
        "nvinfer",
        "@cuda//:cudart",
        "@cudnn",
    ] + select({
        ":windows": ["@cuda//:cublas"],
        "//conditions:default": ["@cuda//:cublas"],
    }),
    alwayslink = True,
    copts = [
        "-pthread"
    ],
    linkopts = [
        "-lpthread",
    ] + select({
        ":aarch64_linux": ["-Wl,--no-as-needed -ldl -lrt -Wl,--as-needed"],
        "//conditions:default": []
    })
)

with this (:+1:):

cc_library(
    name = "nvinferplugin",
    hdrs = select({
        ":aarch64_linux": glob(["include/aarch64-linux-gnu/NvInferPlugin*.h"]),
        ":windows": glob(["include/NvInferPlugin*.h"]),
        "//conditions:default": glob(["include/x86_64-linux-gnu/NvInferPlugin*.h"]),
    }),
    srcs = select({
        ":aarch64_linux": ["lib/aarch64-linux-gnu/libnvinfer_plugin.so"],
        ":windows": ["lib/nvinfer_plugin.lib","lib/nvinfer_plugin.dll"],
        "//conditions:default": ["lib/x86_64-linux-gnu/libnvinfer_plugin.so"],
    }),
    includes = select({
        ":aarch64_linux": ["include/aarch64-linux-gnu/"],
        ":windows": ["include/"],
        "//conditions:default": ["include/x86_64-linux-gnu/"],
    }),
    deps = [
        "nvinfer",
        "@cuda//:cudart",
        "@cudnn",
    ] + select({
        ":windows": ["@cuda//:cublas", "nvinfer_static_lib"],
        "//conditions:default": ["@cuda//:cublas"],
    }),
    alwayslink = True,
    copts = [
        "-pthread"
    ],
    linkopts = [
        "-lpthread",
    ] + select({
        ":aarch64_linux": ["-Wl,--no-as-needed -ldl -lrt -Wl,--as-needed"],
        "//conditions:default": []
    })
)

if BAZEL_EXE is None: BAZEL_EXE = which("bazel") if BAZEL_EXE is None: sys.exit("Could not find bazel in PATH")

_to_ (:+1):

BAZEL_EXE = "C:/ProgramData/chocolatey/bin/bazel.exe"

if BAZEL_EXE is None: BAZEL_EXE = which("bazel") if BAZEL_EXE is None: sys.exit("Could not find bazel in PATH")


- So I try again and...

C:\Torch-TensorRT\py>python3 setup.py install ... INFO: From Linking cpp/lib/torch_tensorrt.dll: LINK : warning LNK4044: unrecognized option '/lpthread'; ignored LINK : warning LNK4044: unrecognized option '/lpthread'; ignored LINK : warning LNK4044: unrecognized option '/Wl,-rpath,lib/'; ignored LINK : warning LNK4044: unrecognized option '/D_GLIBCXX_USE_CXX11_ABI=0'; ignored Creating library bazel-out/x64_windows-opt/bin/cpp/lib/torch_tensorrt.dll.if.lib and object bazel-out/x64_windows-opt/bin/cpp/lib/torch_tensorrt.dll.if.exp Target //:libtorchtrt up-to-date: bazel-bin/libtorchtrt.tar.gz INFO: Elapsed time: 56.634s, Critical Path: 18.50s INFO: 112 processes: 2 internal, 110 local. INFO: Build completed successfully, 112 total actions Traceback (most recent call last): File "C:\Torch-TensorRT\py\setup.py", line 260, in setup( File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.9_3.9.2800.0_x64qbz5n2kfra8p0\lib\site-packages\setuptools__init.py", line 153, in setup return distutils.core.setup(**attrs) File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.9_3.9.2800.0_x64qbz5n2kfra8p0\lib\distutils\core.py", line 148, in setup dist.run_commands() File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.9_3.9.2800.0_x64qbz5n2kfra8p0\lib\distutils\dist.py", line 966, in run_commands self.run_command(cmd) File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.9_3.9.2800.0_x64__qbz5n2kfra8p0\lib\distutils\dist.py", line 985, in run_command cmd_obj.run() File "C:\Torch-TensorRT\py\setup.py", line 160, in run gen_version_file() File "C:\Torch-TensorRT\py\setup.py", line 113, in gen_version_file os.mknod(dir_path + '/torch_tensorrt/_version.py') AttributeError: module 'os' has no attribute 'mknod'

That makes sense, since  `os.mknod` is available only for Unix systems, it's enough to replace it with `open()`, so I change (again) `setup.py`
_from_ (:-1:):

def gen_version_file(): if not os.path.exists(dir_path + '/torch_tensorrt/_version.py'): os.mknod(dir_path + '/torch_tensorrt/_version.py')

with open(dir_path + '/torch_tensorrt/_version.py', 'w') as f:
    print("creating version file")
    f.write("__version__ = \"" + __version__ + '\"')
_to_(:+1:):

def gen_version_file(): if not os.path.exists(dir_path + '/torch_tensorrt/_version.py'): open(dir_path + '/torch_tensorrt/_version.py',"a").close()

with open(dir_path + '/torch_tensorrt/_version.py', 'w') as f:
    print("creating version file")
    f.write("__version__ = \"" + __version__ + '\"')

- Now, before retrying, I clean my environment...

C:\Torch-TensorRT\py>python3 setup.py clean running clean Removing build error: [WinError 267] The directory name is invalid: 'C:\Torch-TensorRT\py\build'

Well, another problem, but apparently it was enough to rename `BUILD` to `BUILD.bazel` and then i was able to clean my environment:

C:\Torch-TensorRT\py>python3 setup.py clean running clean Removing torch_tensorrt\lib Removing torch_tensorrt\include Removing torch_tensorrt_version.py Removing torch_tensorrt\BUILD Removing torch_tensorrt\WORKSPACE Removing torch_tensorrt\LICENSE

So I try again to build the Python package....

C:\Torch-TensorRT\py>python3 setup.py install ... C:\Users\myUser\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\torch\utils\cpp_extension.py:316: UserWarning: Error checking compiler version for cl: [WinError 2] Impossibile trovare il file specificato warnings.warn(f'Error checking compiler version for {compiler}: {error}') building 'torch_tensorrt._C' extension creating build\temp.win-amd64-3.9 creating build\temp.win-amd64-3.9\Release creating build\temp.win-amd64-3.9\Release\torch_tensorrt creating build\temp.win-amd64-3.9\Release\torch_tensorrt\csrc C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.29.30133\bin\HostX86\x64\cl.exe /c /nologo /Ox /W3 /GL /DNDEBUG /MD -UNDEBUG -IC:\Torch-TensorRT\pytorch_tensorrt/csrc -IC:\Torch-TensorRT\pytorch_tensorrt/include -IC:\Torch-TensorRT\py/../bazel-TRTorch/external/tensorrt/include -IC:\Torch-TensorRT\py/../bazel-Torch-TensorRT-Preview/external/tensorrt/include -IC:\Torch-TensorRT\py/../bazel-Torch-TensorRT/external/tensorrt/include -IC:\Torch-TensorRT\py/../ -IC:\Users\myUser\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\torch\include -IC:\Users\myUser\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\torch\include\torch\csrc\api\include -IC:\Users\myUser\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\torch\include\TH -IC:\Users\myUser\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\torch\include\THC -IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.3\include -IC:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.9_3.9.2800.0_x64qbz5n2kfra8p0\include -IC:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.9_3.9.2800.0_x64qbz5n2kfra8p0\include -IC:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.29.30133\ATLMFC\include -IC:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.29.30133\include -IC:\Program Files (x86)\Windows Kits\10\include\10.0.19041.0\ucrt -IC:\Program Files (x86)\Windows Kits\10\include\10.0.19041.0\shared -IC:\Program Files (x86)\Windows Kits\10\include\10.0.19041.0\um -IC:\Program Files (x86)\Windows Kits\10\include\10.0.19041.0\winrt -IC:\Program Files (x86)\Windows Kits\10\include\10.0.19041.0\cppwinrt /EHsc /Tptorch_tensorrt/csrc/register_tensorrt_classes.cpp /Fobuild\temp.win-amd64-3.9\Release\torch_tensorrt/csrc/register_tensorrt_classes.obj /MD /wd4819 /wd4251 /wd4244 /wd4267 /wd4275 /wd4018 /wd4190 /EHsc -Wno-deprecated -Wno-deprecated-declarations -D_GLIBCXX_USE_CXX11_ABI=0 -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=_C -D_GLIBCXX_USE_CXX11_ABI=0 cl : Command line warning D9025 : overriding '/DNDEBUG' with '/UNDEBUG' cl : Command line error D8021 : invalid numeric argument '/Wno-deprecated' error: command 'C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.29.30133\bin\HostX86\x64\cl.exe' failed with exit code 2


So, always in `setup.py` I remove all this `-Wno-deprecated` options and substitute
_this_ (:-1:):

ext_modules = [ cpp_extension.CUDAExtension( 'torch_tensorrt._C', [ 'torch_tensorrt/csrc/torch_tensorrt_py.cpp', 'torch_tensorrt/csrc/tensorrt_backend.cpp', 'torch_tensorrt/csrc/tensorrt_classes.cpp', 'torch_tensorrt/csrc/register_tensorrt_classes.cpp', ], library_dirs=[(dir_path + '/torch_tensorrt/lib/'), "/opt/conda/lib/python3.6/config-3.6m-x86_64-linux-gnu"], libraries=["torchtrt"], include_dirs=[ dir_path + "torch_tensorrt/csrc", dir_path + "torch_tensorrt/include", dir_path + "/../bazel-TRTorch/external/tensorrt/include", dir_path + "/../bazel-Torch-TensorRT-Preview/external/tensorrt/include", dir_path + "/../bazel-Torch-TensorRT/external/tensorrt/include", dir_path + "/../" ], extra_compile_args=[ "-Wno-deprecated", "-Wno-deprecated-declarations", ] + (["-D_GLIBCXX_USE_CXX11_ABI=1"] if CXX11_ABI else ["-D_GLIBCXX_USE_CXX11_ABI=0"]), extra_link_args=[ "-Wno-deprecated", "-Wno-deprecated-declarations", "-Wl,--no-as-needed", "-ltorchtrt", "-Wl,-rpath,$ORIGIN/lib", "-lpthread", "-ldl", "-lutil", "-lrt", "-lm", "-Xlinker", "-export-dynamic" ] + (["-D_GLIBCXX_USE_CXX11_ABI=1"] if CXX11_ABI else ["-D_GLIBCXX_USE_CXX11_ABI=0"]), undef_macros=["NDEBUG"]) ]

with _this_ (:+1:):

ext_modules = [ cpp_extension.CUDAExtension( 'torch_tensorrt._C', [ 'torch_tensorrt/csrc/torch_tensorrt_py.cpp', 'torch_tensorrt/csrc/tensorrt_backend.cpp', 'torch_tensorrt/csrc/tensorrt_classes.cpp', 'torch_tensorrt/csrc/register_tensorrt_classes.cpp', ], library_dirs=[(dir_path + '/torch_tensorrt/lib/'), "/opt/conda/lib/python3.6/config-3.6m-x86_64-linux-gnu"], libraries=["torchtrt"], include_dirs=[ dir_path + "torch_tensorrt/csrc", dir_path + "torch_tensorrt/include", dir_path + "/../bazel-TRTorch/external/tensorrt/include", dir_path + "/../bazel-Torch-TensorRT-Preview/external/tensorrt/include", dir_path + "/../bazel-Torch-TensorRT/external/tensorrt/include", dir_path + "/../" ], extra_compile_args=[] + (["-D_GLIBCXX_USE_CXX11_ABI=1"] if CXX11_ABI else ["-D_GLIBCXX_USE_CXX11_ABI=0"]), extra_link_args=[ "-Wl,--no-as-needed", "-ltorchtrt", "-Wl,-rpath,$ORIGIN/lib", "-lpthread", "-ldl", "-lutil", "-lrt", "-lm", "-Xlinker", "-export-dynamic" ] + (["-D_GLIBCXX_USE_CXX11_ABI=1"] if CXX11_ABI else ["-D_GLIBCXX_USE_CXX11_ABI=0"]), undef_macros=["NDEBUG"]) ]


- Finally, and now unfortunately I don't know how to proceed, this is the last error I got when I tried to build the Python package for the last time:

C:\Torch-TensorRT\py>python3 setup.py clean running clean Removing torch_tensorrt\lib Removing torch_tensorrt\include Removing torch_tensorrt_version.py Removing torch_tensorrt\BUILD Removing torch_tensorrt\WORKSPACE Removing torch_tensorrt\LICENSE

C:\Torch-TensorRT\py>python3 setup.py install ... C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.29.30133\bin\HostX86\x64\link.exe /nologo /INCREMENTAL:NO /LTCG /DLL /MANIFEST:EMBED,ID=2 /MANIFESTUAC:NO /LIBPATH:C:\Torch-TensorRT\py/torch_tensorrt/lib/ /LIBPATH:/opt/conda/lib/python3.6/config-3.6m-x86_64-linux-gnu /LIBPATH:C:\Users\myUser\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\torch\lib /LIBPATH:C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.3\lib/x64 /LIBPATH:C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.9_3.9.2800.0_x64qbz5n2kfra8p0\libs /LIBPATH:C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.9_3.9.2800.0_x64qbz5n2kfra8p0\PCbuild\amd64 /LIBPATH:C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.29.30133\ATLMFC\lib\x64 /LIBPATH:C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.29.30133\lib\x64 /LIBPATH:C:\Program Files (x86)\Windows Kits\10\lib\10.0.19041.0\ucrt\x64 /LIBPATH:C:\Program Files (x86)\Windows Kits\10\lib\10.0.19041.0\um\x64 torchtrt.lib c10.lib torch.lib torch_cpu.lib torch_python.lib cudart.lib c10_cuda.lib torch_cuda_cu.lib torch_cuda_cpp.lib /EXPORT:PyInit__C build\temp.win-amd64-3.9\Release\torch_tensorrt/csrc/register_tensorrt_classes.obj build\temp.win-amd64-3.9\Release\torch_tensorrt/csrc/tensorrt_backend.obj build\temp.win-amd64-3.9\Release\torch_tensorrt/csrc/tensorrt_classes.obj build\temp.win-amd64-3.9\Release\torch_tensorrt/csrc/torch_tensorrt_py.obj /OUT:build\lib.win-amd64-3.9\torch_tensorrt_C.cp39-win_amd64.pyd /IMPLIB:build\temp.win-amd64-3.9\Release\torch_tensorrt/csrc_C.cp39-win_amd64.lib -Wl,--no-as-needed -ltorchtrt -Wl,-rpath,$ORIGIN/lib -lpthread -ldl -lutil -lrt -lm -Xlinker -export-dynamic -D_GLIBCXX_USE_CXX11_ABI=0 LINK : warning LNK4044: unrecognized option '/Wl,--no-as-needed'; ignored LINK : warning LNK4044: unrecognized option '/ltorchtrt'; ignored LINK : warning LNK4044: unrecognized option '/Wl,-rpath,$ORIGIN/lib'; ignored LINK : warning LNK4044: unrecognized option '/lpthread'; ignored LINK : warning LNK4044: unrecognized option '/ldl'; ignored LINK : warning LNK4044: unrecognized option '/lutil'; ignored LINK : warning LNK4044: unrecognized option '/lrt'; ignored LINK : warning LNK4044: unrecognized option '/lm'; ignored LINK : warning LNK4044: unrecognized option '/Xlinker'; ignored LINK : warning LNK4044: unrecognized option '/export-dynamic'; ignored LINK : warning LNK4044: unrecognized option '/D_GLIBCXX_USE_CXX11_ABI=0'; ignored LINK : fatal error LNK1181: cannot open input file 'torchtrt.lib' error: command 'C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.29.30133\bin\HostX86\x64\link.exe' failed with exit code 1181



All I know is that the linker is trying to link against a static library `torchtrt.lib` while all I have is a dynamic library `torch_tensorrt.dll`
narendasan commented 2 years ago

That's really cool that you got windows compilation working! So really all you need to move forward with your specific use case is just linking/DL_OPEN libtorchtrt_runtimein your app which should be available from just the c++ compilation. So Python is not strictly required. I suspect for python api compilation we need a new set of equivalent flags for what we have for Linux here: https://github.com/NVIDIA/Torch-TensorRT/blob/4fd886d08ce77323995b5bf6a21a0d0e8dde8d42/py/setup.py#L231 that would get swapped in for people building for windows. Not sure what those flags would be for MSVC to specify dll over lib

andreabonvini commented 2 years ago

I actually have no libtorchtrt_runtime file, this is the folder tree of bazel-out/x64_windows-opt/bin/cpp and bazel-out/x64_windows-opt/bin/core/runtime (I can send the full bazel-out.zip if you think it could help).

C:\TORCH-TENSORRT\BAZEL-OUT\X64_WINDOWS-OPT\BIN\CPP
│   torch_tensorrt.lo.lib
│   torch_tensorrt.lo.lib-2.params
│
├───lib
│       torch_tensorrt.dll
│       torch_tensorrt.dll-2.params
│       torch_tensorrt.dll.gen.empty.def
│       torch_tensorrt.dll.if.exp
│       torch_tensorrt.dll.if.lib
│
├───_objs
│   └───torch_tensorrt
│           compile_spec.obj
│           logging.obj
│           ptq.obj
│           torch_tensorrt.obj
│           types.obj
│
└───_virtual_includes
    └───torch_tensorrt
        └───torch_tensorrt
                logging.h
                macros.h
                ptq.h
                torch_tensorrt.h
C:\TORCH-TENSORRT\BAZEL-OUT\X64_WINDOWS-OPT\BIN\CORE\RUNTIME
│   include.args
│   include.tar
│   runtime.lo.lib
│   runtime.lo.lib-2.params
│
└───_objs
    └───runtime
            CudaDevice.obj
            DeviceList.obj
            register_trt_op.obj
            runtime.obj
            TRTEngine.obj

Moreover, I' not sure to understand how linking against libtorchtrt_runtime would solve my problem (given that my c++ app crashes in model = torch::jit::load(MODEL_PATH, device);)

narendasan commented 2 years ago

Moreover, I' not sure to understand how linking against libtorchtrt_runtime would solve my problem (given that my c++ app crashes in model = torch::jit::load(MODEL_PATH, device);)

I suspect that the reason a compiled module is throwing an error on load is because you need the LibTorch runtime extension which add support for Torch-TensorRT compiled modules to deserialize and run. The lightest way to do this is by linking libtorchtrt_runtime to your application which simply loads the runtime extension.

Probably what you need to do to add the torchtrt_runtime.dll target is to modify //cpp/lib/BUILD to add the following target similar to torch_tensorrt.dll

cc_binary(
    name = "torchtrt_runtime.dll",
    srcs = [],
    linkshared = True,
    linkstatic = True,
    deps = [
        "//core/runtime:runtime",
        "//core/plugins:torch_tensorrt_plugins"
    ],
)
jonahclarsen commented 2 years ago

@andreabonvini did you end up solving this issue? I am facing a similar problem now.

yuriishutkin commented 2 years ago

torch_tensorrt.dll file is actually the name of generated library as narendasan said. torch_tensorrt.dll.if.lib should be used when you want to link it with the rest of your application.

But for me it was not end of the story, because after I've built the torch_tensorrt library it appeared to have conflicts with installed torch library. It just crashed with exception somewhere inside the torch. I suppose it's because torch has C++ in its interface, and my compiler version differs from the compiler that was used to build torch.

So, solution can be to match the version of compiler that torch is built or to build torch from sources.

Alternatively, you can switch to WSL and install prebuilt torch_tensorrt package or use ready container from here: https://catalog.ngc.nvidia.com/orgs/nvidia/containers/pytorch.

andreabonvini commented 2 years ago

@jonahclarsen Hi! Unfortunately not, I think I was kinda able to generate torchtrt_runtime.dll but something was wrong when I tried to link it to my program. I haven't had the time to continue investigating recently, but for sure I will retry in the following two months. I will 100% follow this thread during my spare time though.

jonahclarsen commented 2 years ago

@jonahclarsen Hi! Unfortunately not, I think I was kinda able to generate torchtrt_runtime.dll but something was wrong when I tried to link it to my program. I haven't had the time to continue investigating recently, but for sure I will retry in the following two months. I will 100% follow this thread during my spare time though.

Okay, too bad! Hopefully we can figure it all out soon, I am highly motivated to get this into my Libtorch Windows program.

jonahclarsen commented 2 years ago

@yuriishutkin When I tried linking my program against torch_tensorrt.dll.if.lib, I still get 'unresolved external symbol' linker errors, even just using a the Input() function that isn't in any namespaces. Are you saying that file was enough for you to successfully link your program? Were you able to use namespaces like torchscript?

yuriishutkin commented 2 years ago

@yuriishutkin When I tried linking my program against torch_tensorrt.dll.if.lib, I still get 'unresolved external symbol' linker errors, even just using a the Input() function that isn't in any namespaces. Are you saying that file was enough for you to successfully link your program? Were you able to use namespaces like torchscript?

Right, I've added runtime and plugin sources into the same library. Also, I had problems with exporting symbols, because MSVC does not have option to export all symbols like GCC does. If you also use MSVC, you need to specify exported symbols manually, e.g. in export file.

For me the following worked:

 cpp/lib/BUILD | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/cpp/lib/BUILD b/cpp/lib/BUILD
index e6d50613..58102867 100644
--- a/cpp/lib/BUILD
+++ b/cpp/lib/BUILD
@@ -38,5 +38,10 @@ cc_binary(
     linkstatic = True,
     deps = [
         "//cpp:torch_tensorrt",
+        "//core/runtime:runtime",
+        "//core/plugins:torch_tensorrt_plugins"
     ],
+   win_def_file = "exports.def"
 )
+
+

exports.def is in attached archive, place it near cpp/lib/BUILD . Yours can be different depending on the version of lib you are using. Just add all unresolved externals to the list. exports.zip

jonahclarsen commented 2 years ago

@yuriishutkin Okay, I went another route, by adding __declspec(dllexport) to every unresolved external, I've detailed this in #1014. However, I am still getting this error related to a function defined in nvinfer_plugins, and I have yet to find a way to resolve it:

Creating library bazel-out/x64_windows-opt/bin/core/plugins/torch_tensorrt_plugins.if.lib and object bazel-out/x64_windows-opt/bin/core/plugins/torch_tensorrt_plugins.if.exp register_plugins.obj : error LNK2019: unresolved external symbol initLibNvInferPlugins referenced in function "public: __cdecl torch_tensorrt::core::plugins::impl::TorchTRTPluginRegistry::TorchTRTPluginRegistry(void)" (??0TorchTRTPluginRegistry@impl@plugins@core@torch_tensorrt@@QEAA@XZ)

Would you be willing to share your entire WORKSPACE and cpp/lib/BUILD files, or ideally even your entire project that was able to succesfully compile the .lib file?

yuriishutkin commented 2 years ago

@jonahclarsen Sure, please take a look.

https://github.com/yuriishutkin/Torch-TensorRT/tree/windows

I run in py directory python setup.py install

It builds torch_tensorrt.dll + torch_tensorrt.dll.if.lib and then links it to _C lib. The only thing, I do manually copy bazel-out\x64_windows-opt\bin\cpp\lib\torch_tensorrt.dll.if.lib to py\torch_tensorrt\lib\ because bazel do not copy this file automatically.

But once again, for me resulting _C lib is not loaded successfully in python because of exception inside torch.

github-actions[bot] commented 2 years ago

This issue has not seen activity for 90 days, Remove stale label or comment or this will be closed in 10 days