pytorch / TensorRT

PyTorch/TorchScript/FX compiler for NVIDIA GPUs using TensorRT
https://pytorch.org/TensorRT
BSD 3-Clause "New" or "Revised" License
2.41k stars 333 forks source link

❓ [Question] How do you compile a chunk operator with TensorRT? #2955

Open joshuageddes opened 1 week ago

joshuageddes commented 1 week ago

❓ Question

How do you compile a chunk operator with TensorRT? I have been trying a basic example in a Jupyter Notebook but get an unbroadcastable dimension error. The below code executes in PyTorch inference and torchscript, but cannot be compiled with TensorRT.

What you have already tried

import torch.nn as nn
import torch_tensorrt
device = "cuda"

class TestModel(nn.Module):
    def __init__(self):
        super().__init__()
    def forward(self, x, y):
        y1, _ = y.chunk(2, dim=0) #y1.shape --> (1, 3)
        return x + y1 #(2, 3) + (1, 3)

model = TestModel()
model.eval()

x = torch.randn((2, 3), device=device)
y = torch.randn((2, 3), device=device)

model(x, y)

traced_model = torch.jit.trace(model, (x, y))

trt_model = torch_tensorrt.compile(traced_model, 
    inputs=[torch_tensorrt.Input(shape=x.shape, dtype=torch.float32),
    torch_tensorrt.Input(shape=y.shape, dtype=torch.float32)]
    )

Error messages:

ERROR: [Torch-TensorRT TorchScript Conversion Context] - ITensor::getDimensions: Error Code 4: Shape Error (broadcast dimensions must be conformable)
ERROR: [Torch-TensorRT TorchScript Conversion Context] - IBuilder::buildSerializedNetwork: Error Code 4: Internal Error (%9 : Tensor = aten::add(%x, %y1, %3) # [...): IElementWiseLayer must have inputs with same dimensions or follow broadcast rules. Input dimensions were [2,3] and [1,0].)

Environment

Build information about Torch-TensorRT can be found by turning on debug messages

Thank you for the help!

narendasan commented 1 week ago

This is a limitation of the Torchscript frontend. The dynamo frontend supports this operator. If you still want to use torchscript, we recommend compiling through dynamo then tracing the compiled module.

import torch
import torch.nn as nn
import torch_tensorrt
device = "cuda"

class TestModel(nn.Module):
    def __init__(self):
        super().__init__()
    def forward(self, x, y):
        y1, _ = y.chunk(2, dim=0) #y1.shape --> (1, 3)
        return x + y1 #(2, 3) + (1, 3)

model = TestModel()
model.eval()

x = torch.randn((2, 3), device=device)
y = torch.randn((2, 3), device=device)

model(x, y)

#traced_model = torch.jit.trace(model, (x, y))

trt_model = torch_tensorrt.compile(model,
    ir="dynamo", # Default if the input is nn.Module or fx.GraphModule
    inputs=[torch_tensorrt.Input(shape=x.shape, dtype=torch.float32),
    torch_tensorrt.Input(shape=y.shape, dtype=torch.float32)],
    min_block_size=1
    )

max_diff = float(
    torch.max(torch.abs(model(x, y) - trt_model(x,y)))
)

print(max_diff)

ts_trt_model = torch.jit.trace(trt_model, (x, y))

max_diff = float(
    torch.max(torch.abs(model(x, y) - ts_trt_model(x,y)))
)

print(max_diff)