🐛 [Bug] Torch-TensorRT do not support gpt2

Bug Description

ERROR: [Torch-TensorRT] - Unsupported operator: aten::where.self(Tensor condition, Tensor self, Tensor other) -> (Tensor)
/opt/conda/lib/python3.8/site-packages/transformers/models/gpt2/modeling_gpt2.py(206): _attn
/opt/conda/lib/python3.8/site-packages/transformers/models/gpt2/modeling_gpt2.py(336): forward
/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py(1090): _slow_forward
/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py(1102): _call_impl
/opt/conda/lib/python3.8/site-packages/transformers/models/gpt2/modeling_gpt2.py(395): forward
/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py(1090): _slow_forward
/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py(1102): _call_impl
/opt/conda/lib/python3.8/site-packages/transformers/models/gpt2/modeling_gpt2.py(890): forward
/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py(1090): _slow_forward
/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py(1102): _call_impl
/opt/conda/lib/python3.8/site-packages/transformers/models/gpt2/modeling_gpt2.py(1047): forward
/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py(1090): _slow_forward
/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py(1102): _call_impl
/opt/conda/lib/python3.8/site-packages/torch/jit/_trace.py(958): trace_module
/opt/conda/lib/python3.8/site-packages/torch/jit/_trace.py(741): trace
<stdin>(1): <module>

ERROR: [Torch-TensorRT] - Unsupported operator: aten::Int.Tensor(Tensor a) -> (int)

ERROR: [Torch-TensorRT] - Unsupported operator: aten::ScalarImplicit(Tensor a) -> (Scalar)

ERROR: [Torch-TensorRT] - Unsupported operator: aten::where.self(Tensor condition, Tensor self, Tensor other) -> (Tensor)
/opt/conda/lib/python3.8/site-packages/transformers/models/gpt2/modeling_gpt2.py(206): _attn
/opt/conda/lib/python3.8/site-packages/transformers/models/gpt2/modeling_gpt2.py(336): forward
/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py(1090): _slow_forward
/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py(1102): _call_impl
/opt/conda/lib/python3.8/site-packages/transformers/models/gpt2/modeling_gpt2.py(395): forward
/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py(1090): _slow_forward
/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py(1102): _call_impl
/opt/conda/lib/python3.8/site-packages/transformers/models/gpt2/modeling_gpt2.py(890): forward
/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py(1090): _slow_forward
/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py(1102): _call_impl
/opt/conda/lib/python3.8/site-packages/transformers/models/gpt2/modeling_gpt2.py(1047): forward
/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py(1090): _slow_forward
/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py(1102): _call_impl
/opt/conda/lib/python3.8/site-packages/torch/jit/_trace.py(958): trace_module
/opt/conda/lib/python3.8/site-packages/torch/jit/_trace.py(741): trace
<stdin>(1): <module>

ERROR: [Torch-TensorRT] - Unsupported operator: aten::Int.Tensor(Tensor a) -> (int)

ERROR: [Torch-TensorRT] - Unsupported operator: aten::ScalarImplicit(Tensor a) -> (Scalar)

WARNING: [Torch-TensorRT] - Input type for doing shape analysis could not be determined, defaulting to F32
WARNING: [Torch-TensorRT] - Truncating graph input type from at::kLong to at::kInt
WARNING: [Torch-TensorRT] - Truncating graph input type from at::kLong to at::kInt
WARNING: [Torch-TensorRT] - Truncating graph input type from at::kLong to at::kInt
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/opt/conda/lib/python3.8/site-packages/torch_tensorrt/_compile.py", line 115, in compile
    return torch_tensorrt.ts.compile(ts_mod, inputs=inputs, enabled_precisions=enabled_precisions, **kwargs)
  File "/opt/conda/lib/python3.8/site-packages/torch_tensorrt/ts/_compiler.py", line 119, in compile
    compiled_cpp_mod = _C.compile_graph(module._c, _parse_compile_spec(spec))
RuntimeError: The following operation failed in the TorchScript interpreter.
Traceback of TorchScript (most recent call last):
/opt/conda/lib/python3.8/site-packages/torch/nn/functional.py(2047): embedding
/opt/conda/lib/python3.8/site-packages/torch/nn/modules/sparse.py(158): forward
/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py(1090): _slow_forward
/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py(1102): _call_impl
/opt/conda/lib/python3.8/site-packages/transformers/models/gpt2/modeling_gpt2.py(833): forward
/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py(1090): _slow_forward
/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py(1102): _call_impl
/opt/conda/lib/python3.8/site-packages/transformers/models/gpt2/modeling_gpt2.py(1047): forward
/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py(1090): _slow_forward
/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py(1102): _call_impl
/opt/conda/lib/python3.8/site-packages/torch/jit/_trace.py(958): trace_module
/opt/conda/lib/python3.8/site-packages/torch/jit/_trace.py(741): trace
<stdin>(1): <module>
RuntimeError: Expected tensor for argument #1 'indices' to have one of the following scalar types: Long, Int; but got CUDAFloatType instead (while checking arguments for embedding)

To Reproduce

import torch
from transformers import GPT2Tokenizer, GPT2LMHeadModel
import torch_tensorrt

tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
model = GPT2LMHeadModel.from_pretrained('gpt2', return_dict=False)
model.eval()

tokens=tokenizer('The cat is on the table.', return_tensors='pt')['input_ids']
traced_model = torch.jit.trace(model, tokens)

compile_settings = {
        "inputs": [torch_tensorrt.Input(
            # min_shape=[1, 3, 224, 224],
            # opt_shape=[1, 3, 512, 512],
            # max_shape=[1, 3, 1024, 1024],
            # For static size
            shape=[1, 7],
            dtype=torch.int32,# Datatype of input tensor. Allowed options torch.(float|half|int8|int32|bool)
        )],
        "truncate_long_and_double": True,
        "enabled_precisions": {torch.half}  # Run with FP16
    }
trt_model = torch_tensorrt.compile(traced_model, **compile_settings)

Expected behavior

No error

Environment

docker build --build-arg BASE=21.10 -f docker/Dockerfile -t torch_tensorrt:latest .

Torch-TensorRT Version (e.g. 1.0.0):
PyTorch Version (e.g. 1.0):
CPU Architecture:
OS (e.g., Linux):
How you installed PyTorch (conda, pip, libtorch, source):
Build command you used (if compiling from source):
Are you using local sources or building from archives:
Python version:
CUDA version:
GPU models and configuration:
Any other relevant information:

so do it really means torch_tensorrtnot support gpt2 or I did something wrong ？

Is there anyone can help me ? many thanks

Response from the other thread since I am going to close it in favor of this one:

First thing I would try is to make sure your model is on GPU before compiling. These errors about op support may be misleading if partial compilation is enabled (which it is by default). But the Runtime error from PyTorch is caused by tensors not being on the same device. I think from there, you should be able to run, if not, we have dedicated support for some cases of aten::Int coming, as well as we can look at adding support for where.self

Response from the other thread since I am going to close it in favor of this one:

First thing I would try is to make sure your model is on GPU before compiling. These errors about op support may be misleading if partial compilation is enabled (which it is by default). But the Runtime error from PyTorch is caused by tensors not being on the same device. I think from there, you should be able to run, if not, we have dedicated support for some cases of aten::Int coming, as well as we can look at adding support for where.self

I make sure to put model on GPU and tokens as well, it still comes the same error, please help and check the demo below, many thanks!

import torch
from transformers import GPT2Tokenizer, GPT2LMHeadModel
import torch_tensorrt

tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
model = GPT2LMHeadModel.from_pretrained('gpt2', return_dict=False).cuda()
model.eval()

tokens=tokenizer('The cat is on the table.', return_tensors='pt')['input_ids'].cuda()
traced_model = torch.jit.trace(model, tokens)

compile_settings = {
        "inputs": [torch_tensorrt.Input(
            # min_shape=[1, 3, 224, 224],
            # opt_shape=[1, 3, 512, 512],
            # max_shape=[1, 3, 1024, 1024],
            # For static size
            shape=[1, 7],
            dtype=torch.int32,# Datatype of input tensor. Allowed options torch.(float|half|int8|int32|bool)
        )],
        "truncate_long_and_double": True,
        "enabled_precisions": {torch.half}  # Run with FP16
    }
trt_model = torch_tensorrt.compile(traced_model, **compile_settings)

@Biaocsu I tried your setup and got the error:

RuntimeError                              Traceback (most recent call last)
Input In [36], in <cell line: 1>()
----> 1 trt_ts_module = torch_tensorrt.compile(
      2     traced_model,
      3     inputs=[torch_tensorrt.Input(
      4 #             min_shape=[1, 1],
      5 #             opt_shape=[1, 32],
      6 #             max_shape=[1, 64],
      7             shape=[1, 7],
      8             dtype=torch.int32,
      9         )],
     10     enabled_precisions={torch.half},
     11     truncate_long_and_double=True,
     12     torch_executed_ops=['aten::where'],
     13     require_full_compilation=False
     14 )

File /opt/conda/lib/python3.8/site-packages/torch_tensorrt/_compile.py:115, in compile(module, ir, inputs, enabled_precisions, **kwargs)
    110         logging.log(
    111             logging.Level.Info,
    112             "Module was provided as a torch.nn.Module, trying to script the module with torch.jit.script. In the event of a failure please preconvert your module to TorchScript"
    113         )
    114         ts_mod = torch.jit.script(module)
--> 115     return torch_tensorrt.ts.compile(ts_mod, inputs=inputs, enabled_precisions=enabled_precisions, **kwargs)
    116 elif target_ir == _IRType.fx:
    117     raise RuntimeError("fx is currently not supported")

File /opt/conda/lib/python3.8/site-packages/torch_tensorrt/ts/_compiler.py:116, in compile(module, inputs, device, disable_tf32, sparse_weights, enabled_precisions, refit, debug, strict_types, capability, num_min_timing_iters, num_avg_timing_iters, workspace_size, calibrator, truncate_long_and_double, require_full_compilation, min_block_size, torch_executed_ops, torch_executed_modules)
     89     raise ValueError(
     90         "require_full_compilation is enabled however the list of modules and ops to run in torch is not empty. Found: torch_executed_ops: "
     91         + torch_executed_ops + ", torch_executed_modules: " + torch_executed_modules)
     93 spec = {
     94     "inputs": inputs,
     95     "device": device,
   (...)
    113     }
    114 }
--> 116 compiled_cpp_mod = _C.compile_graph(module._c, _parse_compile_spec(spec))
    117 compiled_module = torch.jit._recursive.wrap_cpp_module(compiled_cpp_mod)
    118 return compiled_module

RuntimeError: [Error thrown at core/partitioning/shape_analysis.cpp:127] Unsupported input data type unsigned char

Seems like HF implementation uses byte datatype, which may not be supported?

@calclavia not sure. I tried pytorch -> onnx -> tensorrt method , and find tensorrt do not support NonZero operation anyway, I do not success convert gpt2 by torch_tensorrt

I encountered the same error while testing nvcr.io/nvidia/pytorch:22.12-py3. Is this bug really fixed? @Biaocsu were you able to convert from gpt2 to torch_tensorrt?

Hi, I face the same issues when I try to compile the GPT2 into the tensorrt format.

The corresponding environments:

ubuntu 22.04

Driver Version: 515.65.01
CUDA Version: 11.7
GPU: rtx 3090

torch == 1.13.1+cu117
torch_tensorrt == 1.3.0
transformer == 4.25.1

The sample code to produce the error:

tokens=tokenizer('The cat is on the table.', return_tensors='pt')
for k in tokens.keys():
    tokens[k] = tokens[k].cuda()

tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
gpt2_model = GPT2LMHeadModel.from_pretrained('gpt2', return_dict=False, torchscript=True)
gpt2_model.eval()
gpt2_model.cuda()

traced_gpt2_model = torch.jit.trace(gpt2_model, tokens["input_ids"])

trt_gpt2_model = torch_tensorrt.compile(traced_gpt2_model, 
    inputs= [torch_tensorrt.Input(shape=[batch_size, 7], dtype=torch.int32)], # Input ids
    enabled_precisions= {torch.float32}, # Run with 32-bit precision
    workspace_size=2000000000,
    truncate_long_and_double=True
)

Can anyone compile the GPT2 successfully? Big thanks!

pytorch / TensorRT