❓ [Question] How do you compile a chunk operator with TensorRT?

❓ Question

How do you compile a chunk operator with TensorRT? I have been trying a basic example in a Jupyter Notebook but get an unbroadcastable dimension error. The below code executes in PyTorch inference and torchscript, but cannot be compiled with TensorRT.

What you have already tried

import torch.nn as nn
import torch_tensorrt
device = "cuda"

class TestModel(nn.Module):
    def __init__(self):
        super().__init__()
    def forward(self, x, y):
        y1, _ = y.chunk(2, dim=0) #y1.shape --> (1, 3)
        return x + y1 #(2, 3) + (1, 3)

model = TestModel()
model.eval()

x = torch.randn((2, 3), device=device)
y = torch.randn((2, 3), device=device)

model(x, y)

traced_model = torch.jit.trace(model, (x, y))

trt_model = torch_tensorrt.compile(traced_model, 
    inputs=[torch_tensorrt.Input(shape=x.shape, dtype=torch.float32),
    torch_tensorrt.Input(shape=y.shape, dtype=torch.float32)]
    )

Error messages:

ERROR: [Torch-TensorRT TorchScript Conversion Context] - ITensor::getDimensions: Error Code 4: Shape Error (broadcast dimensions must be conformable)
ERROR: [Torch-TensorRT TorchScript Conversion Context] - IBuilder::buildSerializedNetwork: Error Code 4: Internal Error (%9 : Tensor = aten::add(%x, %y1, %3) # [...): IElementWiseLayer must have inputs with same dimensions or follow broadcast rules. Input dimensions were [2,3] and [1,0].)

Environment

Build information about Torch-TensorRT can be found by turning on debug messages

PyTorch Version (e.g., 1.0): 2.3.0
CPU Architecture:
OS (e.g., Linux): Linux
How you installed PyTorch (conda, pip, libtorch, source): pip
Build command you used (if compiling from source):
Are you using local sources or building from archives:
Python version: 3.10.14
CUDA version: 12.1
GPU models and configuration: A100
Any other relevant information:

Thank you for the help!

This is a limitation of the Torchscript frontend. The dynamo frontend supports this operator. If you still want to use torchscript, we recommend compiling through dynamo then tracing the compiled module.

import torch
import torch.nn as nn
import torch_tensorrt
device = "cuda"

class TestModel(nn.Module):
    def __init__(self):
        super().__init__()
    def forward(self, x, y):
        y1, _ = y.chunk(2, dim=0) #y1.shape --> (1, 3)
        return x + y1 #(2, 3) + (1, 3)

model = TestModel()
model.eval()

x = torch.randn((2, 3), device=device)
y = torch.randn((2, 3), device=device)

model(x, y)

#traced_model = torch.jit.trace(model, (x, y))

trt_model = torch_tensorrt.compile(model,
    ir="dynamo", # Default if the input is nn.Module or fx.GraphModule
    inputs=[torch_tensorrt.Input(shape=x.shape, dtype=torch.float32),
    torch_tensorrt.Input(shape=y.shape, dtype=torch.float32)],
    min_block_size=1
    )

max_diff = float(
    torch.max(torch.abs(model(x, y) - trt_model(x,y)))
)

print(max_diff)

ts_trt_model = torch.jit.trace(trt_model, (x, y))

max_diff = float(
    torch.max(torch.abs(model(x, y) - ts_trt_model(x,y)))
)

print(max_diff)

pytorch / TensorRT

❓ [Question] How do you compile a chunk operator with TensorRT? #2955

❓ Question

What you have already tried

Environment