❓ [Question] The same inputs producing very different outputs via pytorch & TensorRT.

Amoko commented 2 years ago

❓ Question

Hey, guys! I'm new to TensorRT, after the environment setup. I'm very excited to try the official demo in this page. Resnet50-example.. I got very different outputs when inference with the same inputs via pytorch & TensorRT. But when I use efficientnet_b3 as the model, the results are same.

What you have already tried

Environment

Build information about Torch-TensorRT can be found by turning on debug messages

PyTorch Version: 1.11.0+cu113
TensorRT Version: 8.4.1.5
torch_tensorrt. Version: 1.1.0
CPU Architecture: x86_64
OS (e.g., Linux): Ubuntu 20.04.2 LTS
How you installed PyTorch : pip
How you installed TensorRT: pip
Are you using local sources or building from archives: No
Python version: 3.8.8
CUDA version: 11.4
GPU models and configuration: NVIDIA GeForce RTX 3090
Any other relevant information:

Additional context

Here is my model convert code from PyTorch to TensorRT

import time
import numpy as np
import torch
torch.manual_seed(1989)
import tensorrt
import torch_tensorrt
from torchvision import models

if __name__ == '__main__':
    # 1 get pytorch model
    model = models.resnet50(pretrained=False)
    #model = models.efficientnet_b3(pretrained=False)
    model = model.eval().to('cuda')

    # 2 conver to tensorrt model
    input_shape=(1,3,224,224)
    ts_model = torch.jit.script(model)
    trt_model = torch_tensorrt.compile(
        model, 
        inputs=[torch_tensorrt.Input(input_shape, dtype=torch.float32)],
        enabled_precisions = torch.float32,
        workspace_size = 1 << 22
        )
    print('Convert over.')
    #torch.jit.save(trt_model, 'trt_model.pt')
    #trt_model = torch.jit.load('trt_model.pt')

    # 3 check speedup
    inputs = torch.randn(input_shape).to('cuda')
    benchmark(model, inputs, dtype='fp32')
    benchmark(ts_model, inputs, dtype='fp32')
    benchmark(trt_model, inputs, dtype='fp32')

And here is the benchmark function for the same inputs.

def benchmark(model, inputs, dtype='fp32', nwarmup=50, nruns=3000):
    model.eval()
    if dtype=='fp16':
        inputs = inputs.half()

    print("Warm up ...")
    with torch.no_grad():
        for _ in range(nwarmup):
            outputs = model(inputs)
    torch.cuda.synchronize()
    print("Start timing ...")
    timings = []
    with torch.no_grad():
        for i in range(1, nruns+1):
            start_time = time.time()
            outputs  = model(inputs)
            torch.cuda.synchronize()
            end_time = time.time()
            timings.append(end_time - start_time)
            if i%1000==0:
                print('Iteration %d/%d, avg batch time %.2f ms'%(i, nruns, np.mean(timings)*1000))
    print(outputs[0][:8])

And here are the strange outputs that I got. 🤯

For efficientnet_b3

WARNING: [Torch-TensorRT TorchScript Conversion Context] - TensorRT was linked against cuDNN 8.4.1 but loaded cuDNN 8.4.0 WARNING: [Torch-TensorRT TorchScript Conversion Context] - TensorRT was linked against cuDNN 8.4.1 but loaded cuDNN 8.4.0 WARNING: [Torch-TensorRT TorchScript Conversion Context] - The getMaxBatchSize() function should not be used with an engine built from a network created with NetworkDefinitionCreationFlag::kEXPLICIT_BATCH flag. This function will always return 1. WARNING: [Torch-TensorRT TorchScript Conversion Context] - The getMaxBatchSize() function should not be used with an engine built from a network created with NetworkDefinitionCreationFlag::kEXPLICIT_BATCH flag. This function will always return 1. WARNING: [Torch-TensorRT] - TensorRT was linked against cuDNN 8.4.1 but loaded cuDNN 8.4.0 WARNING: [Torch-TensorRT] - TensorRT was linked against cuDNN 8.4.1 but loaded cuDNN 8.4.0 Convert over. Warm up ... Start timing ... Iteration 1000/3000, avg batch time 10.76 ms Iteration 2000/3000, avg batch time 10.75 ms Iteration 3000/3000, avg batch time 10.75 ms tensor([ 2.5864e-15, -2.6358e-15, 4.9805e-15, 6.8343e-15, 3.6509e-16, 1.3975e-15, 1.7666e-15, -2.6696e-15], device='cuda:0') Warm up ... Start timing ... Iteration 1000/3000, avg batch time 6.92 ms Iteration 2000/3000, avg batch time 6.92 ms Iteration 3000/3000, avg batch time 6.92 ms tensor([ 2.5864e-15, -2.6358e-15, 4.9805e-15, 6.8343e-15, 3.6509e-16, 1.3975e-15, 1.7666e-15, -2.6696e-15], device='cuda:0') Warm up ... Start timing ... Iteration 1000/3000, avg batch time 0.59 ms Iteration 2000/3000, avg batch time 0.59 ms Iteration 3000/3000, avg batch time 0.59 ms tensor([ 2.5864e-15, -2.6358e-15, 4.9805e-15, 6.8343e-15, 3.6509e-16, 1.3975e-15, 1.7666e-15, -2.6696e-15], device='cuda:0')

For resnet50

WARNING: [Torch-TensorRT] - Dilation not used in Max pooling converter WARNING: [Torch-TensorRT TorchScript Conversion Context] - TensorRT was linked against cuDNN 8.4.1 but loaded cuDNN 8.4.0 WARNING: [Torch-TensorRT TorchScript Conversion Context] - The getMaxBatchSize() function should not be used with an engine built from a network created with NetworkDefinitionCreationFlag::kEXPLICIT_BATCH flag. This function will always return 1. WARNING: [Torch-TensorRT TorchScript Conversion Context] - The getMaxBatchSize() function should not be used with an engine built from a network created with NetworkDefinitionCreationFlag::kEXPLICIT_BATCH flag. This function will always return 1. Convert over. Warm up ... Start timing ... Iteration 1000/3000, avg batch time 5.14 ms Iteration 2000/3000, avg batch time 5.14 ms Iteration 3000/3000, avg batch time 5.14 ms tensor([ 14.4007, -41.5664, -24.5916, -29.5565, 33.5000, 13.3518, -23.5535, -17.9818], device='cuda:0') Warm up ... Start timing ... Iteration 1000/3000, avg batch time 3.53 ms Iteration 2000/3000, avg batch time 3.53 ms Iteration 3000/3000, avg batch time 3.53 ms tensor([ 14.4007, -41.5664, -24.5916, -29.5565, 33.5000, 13.3518, -23.5535, -17.9818], device='cuda:0') Warm up ... Start timing ... Iteration 1000/3000, avg batch time 0.15 ms Iteration 2000/3000, avg batch time 0.15 ms Iteration 3000/3000, avg batch time 0.15 ms tensor([85.8696, 0.9164, 45.5073, 0.4550, 93.4025, 1.5348, 0.0000, 1.1430], device='cuda:0')

github-actions[bot] commented 2 years ago

This issue has not seen activity for 90 days, Remove stale label or comment or this will be closed in 10 days

github-actions[bot] commented 1 year ago

This issue has not seen activity for 90 days, Remove stale label or comment or this will be closed in 10 days

pytorch / TensorRT