pytorch / TensorRT

PyTorch/TorchScript/FX compiler for NVIDIA GPUs using TensorRT
https://pytorch.org/TensorRT
BSD 3-Clause "New" or "Revised" License
2.59k stars 350 forks source link

🐛 [Bug] Torch-TensorRT do not support gpt2 #867

Closed Biaocsu closed 2 years ago

Biaocsu commented 2 years ago

Bug Description

ERROR: [Torch-TensorRT] - Unsupported operator: aten::where.self(Tensor condition, Tensor self, Tensor other) -> (Tensor)
/opt/conda/lib/python3.8/site-packages/transformers/models/gpt2/modeling_gpt2.py(206): _attn
/opt/conda/lib/python3.8/site-packages/transformers/models/gpt2/modeling_gpt2.py(336): forward
/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py(1090): _slow_forward
/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py(1102): _call_impl
/opt/conda/lib/python3.8/site-packages/transformers/models/gpt2/modeling_gpt2.py(395): forward
/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py(1090): _slow_forward
/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py(1102): _call_impl
/opt/conda/lib/python3.8/site-packages/transformers/models/gpt2/modeling_gpt2.py(890): forward
/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py(1090): _slow_forward
/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py(1102): _call_impl
/opt/conda/lib/python3.8/site-packages/transformers/models/gpt2/modeling_gpt2.py(1047): forward
/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py(1090): _slow_forward
/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py(1102): _call_impl
/opt/conda/lib/python3.8/site-packages/torch/jit/_trace.py(958): trace_module
/opt/conda/lib/python3.8/site-packages/torch/jit/_trace.py(741): trace
<stdin>(1): <module>

ERROR: [Torch-TensorRT] - Unsupported operator: aten::Int.Tensor(Tensor a) -> (int)

ERROR: [Torch-TensorRT] - Unsupported operator: aten::ScalarImplicit(Tensor a) -> (Scalar)

ERROR: [Torch-TensorRT] - Unsupported operator: aten::where.self(Tensor condition, Tensor self, Tensor other) -> (Tensor)
/opt/conda/lib/python3.8/site-packages/transformers/models/gpt2/modeling_gpt2.py(206): _attn
/opt/conda/lib/python3.8/site-packages/transformers/models/gpt2/modeling_gpt2.py(336): forward
/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py(1090): _slow_forward
/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py(1102): _call_impl
/opt/conda/lib/python3.8/site-packages/transformers/models/gpt2/modeling_gpt2.py(395): forward
/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py(1090): _slow_forward
/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py(1102): _call_impl
/opt/conda/lib/python3.8/site-packages/transformers/models/gpt2/modeling_gpt2.py(890): forward
/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py(1090): _slow_forward
/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py(1102): _call_impl
/opt/conda/lib/python3.8/site-packages/transformers/models/gpt2/modeling_gpt2.py(1047): forward
/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py(1090): _slow_forward
/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py(1102): _call_impl
/opt/conda/lib/python3.8/site-packages/torch/jit/_trace.py(958): trace_module
/opt/conda/lib/python3.8/site-packages/torch/jit/_trace.py(741): trace
<stdin>(1): <module>

ERROR: [Torch-TensorRT] - Unsupported operator: aten::Int.Tensor(Tensor a) -> (int)

ERROR: [Torch-TensorRT] - Unsupported operator: aten::ScalarImplicit(Tensor a) -> (Scalar)

WARNING: [Torch-TensorRT] - Input type for doing shape analysis could not be determined, defaulting to F32
WARNING: [Torch-TensorRT] - Truncating graph input type from at::kLong to at::kInt
WARNING: [Torch-TensorRT] - Truncating graph input type from at::kLong to at::kInt
WARNING: [Torch-TensorRT] - Truncating graph input type from at::kLong to at::kInt
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/opt/conda/lib/python3.8/site-packages/torch_tensorrt/_compile.py", line 115, in compile
    return torch_tensorrt.ts.compile(ts_mod, inputs=inputs, enabled_precisions=enabled_precisions, **kwargs)
  File "/opt/conda/lib/python3.8/site-packages/torch_tensorrt/ts/_compiler.py", line 119, in compile
    compiled_cpp_mod = _C.compile_graph(module._c, _parse_compile_spec(spec))
RuntimeError: The following operation failed in the TorchScript interpreter.
Traceback of TorchScript (most recent call last):
/opt/conda/lib/python3.8/site-packages/torch/nn/functional.py(2047): embedding
/opt/conda/lib/python3.8/site-packages/torch/nn/modules/sparse.py(158): forward
/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py(1090): _slow_forward
/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py(1102): _call_impl
/opt/conda/lib/python3.8/site-packages/transformers/models/gpt2/modeling_gpt2.py(833): forward
/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py(1090): _slow_forward
/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py(1102): _call_impl
/opt/conda/lib/python3.8/site-packages/transformers/models/gpt2/modeling_gpt2.py(1047): forward
/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py(1090): _slow_forward
/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py(1102): _call_impl
/opt/conda/lib/python3.8/site-packages/torch/jit/_trace.py(958): trace_module
/opt/conda/lib/python3.8/site-packages/torch/jit/_trace.py(741): trace
<stdin>(1): <module>
RuntimeError: Expected tensor for argument #1 'indices' to have one of the following scalar types: Long, Int; but got CUDAFloatType instead (while checking arguments for embedding)

To Reproduce

import torch
from transformers import GPT2Tokenizer, GPT2LMHeadModel
import torch_tensorrt

tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
model = GPT2LMHeadModel.from_pretrained('gpt2', return_dict=False)
model.eval()

tokens=tokenizer('The cat is on the table.', return_tensors='pt')['input_ids']
traced_model = torch.jit.trace(model, tokens)

compile_settings = {
        "inputs": [torch_tensorrt.Input(
            # min_shape=[1, 3, 224, 224],
            # opt_shape=[1, 3, 512, 512],
            # max_shape=[1, 3, 1024, 1024],
            # For static size
            shape=[1, 7],
            dtype=torch.int32,# Datatype of input tensor. Allowed options torch.(float|half|int8|int32|bool)
        )],
        "truncate_long_and_double": True,
        "enabled_precisions": {torch.half}  # Run with FP16
    }
trt_model = torch_tensorrt.compile(traced_model, **compile_settings)

Expected behavior

No error

Environment

docker build --build-arg BASE=21.10 -f docker/Dockerfile -t torch_tensorrt:latest .

so do it really means torch_tensorrtnot support gpt2 or I did something wrong ?

Biaocsu commented 2 years ago

Is there anyone can help me ? many thanks

narendasan commented 2 years ago

Response from the other thread since I am going to close it in favor of this one:

First thing I would try is to make sure your model is on GPU before compiling. These errors about op support may be misleading if partial compilation is enabled (which it is by default). But the Runtime error from PyTorch is caused by tensors not being on the same device. I think from there, you should be able to run, if not, we have dedicated support for some cases of aten::Int coming, as well as we can look at adding support for where.self

Biaocsu commented 2 years ago

Response from the other thread since I am going to close it in favor of this one:

First thing I would try is to make sure your model is on GPU before compiling. These errors about op support may be misleading if partial compilation is enabled (which it is by default). But the Runtime error from PyTorch is caused by tensors not being on the same device. I think from there, you should be able to run, if not, we have dedicated support for some cases of aten::Int coming, as well as we can look at adding support for where.self

I make sure to put model on GPU and tokens as well, it still comes the same error, please help and check the demo below, many thanks!

import torch
from transformers import GPT2Tokenizer, GPT2LMHeadModel
import torch_tensorrt

tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
model = GPT2LMHeadModel.from_pretrained('gpt2', return_dict=False).cuda()
model.eval()

tokens=tokenizer('The cat is on the table.', return_tensors='pt')['input_ids'].cuda()
traced_model = torch.jit.trace(model, tokens)

compile_settings = {
        "inputs": [torch_tensorrt.Input(
            # min_shape=[1, 3, 224, 224],
            # opt_shape=[1, 3, 512, 512],
            # max_shape=[1, 3, 1024, 1024],
            # For static size
            shape=[1, 7],
            dtype=torch.int32,# Datatype of input tensor. Allowed options torch.(float|half|int8|int32|bool)
        )],
        "truncate_long_and_double": True,
        "enabled_precisions": {torch.half}  # Run with FP16
    }
trt_model = torch_tensorrt.compile(traced_model, **compile_settings)
calclavia commented 2 years ago

@Biaocsu I tried your setup and got the error:

RuntimeError                              Traceback (most recent call last)
Input In [36], in <cell line: 1>()
----> 1 trt_ts_module = torch_tensorrt.compile(
      2     traced_model,
      3     inputs=[torch_tensorrt.Input(
      4 #             min_shape=[1, 1],
      5 #             opt_shape=[1, 32],
      6 #             max_shape=[1, 64],
      7             shape=[1, 7],
      8             dtype=torch.int32,
      9         )],
     10     enabled_precisions={torch.half},
     11     truncate_long_and_double=True,
     12     torch_executed_ops=['aten::where'],
     13     require_full_compilation=False
     14 )

File /opt/conda/lib/python3.8/site-packages/torch_tensorrt/_compile.py:115, in compile(module, ir, inputs, enabled_precisions, **kwargs)
    110         logging.log(
    111             logging.Level.Info,
    112             "Module was provided as a torch.nn.Module, trying to script the module with torch.jit.script. In the event of a failure please preconvert your module to TorchScript"
    113         )
    114         ts_mod = torch.jit.script(module)
--> 115     return torch_tensorrt.ts.compile(ts_mod, inputs=inputs, enabled_precisions=enabled_precisions, **kwargs)
    116 elif target_ir == _IRType.fx:
    117     raise RuntimeError("fx is currently not supported")

File /opt/conda/lib/python3.8/site-packages/torch_tensorrt/ts/_compiler.py:116, in compile(module, inputs, device, disable_tf32, sparse_weights, enabled_precisions, refit, debug, strict_types, capability, num_min_timing_iters, num_avg_timing_iters, workspace_size, calibrator, truncate_long_and_double, require_full_compilation, min_block_size, torch_executed_ops, torch_executed_modules)
     89     raise ValueError(
     90         "require_full_compilation is enabled however the list of modules and ops to run in torch is not empty. Found: torch_executed_ops: "
     91         + torch_executed_ops + ", torch_executed_modules: " + torch_executed_modules)
     93 spec = {
     94     "inputs": inputs,
     95     "device": device,
   (...)
    113     }
    114 }
--> 116 compiled_cpp_mod = _C.compile_graph(module._c, _parse_compile_spec(spec))
    117 compiled_module = torch.jit._recursive.wrap_cpp_module(compiled_cpp_mod)
    118 return compiled_module

RuntimeError: [Error thrown at core/partitioning/shape_analysis.cpp:127] Unsupported input data type unsigned char

Seems like HF implementation uses byte datatype, which may not be supported?

Biaocsu commented 2 years ago

@calclavia not sure. I tried pytorch -> onnx -> tensorrt method , and find tensorrt do not support NonZero operation anyway, I do not success convert gpt2 by torch_tensorrt

cyang49 commented 1 year ago

I encountered the same error while testing nvcr.io/nvidia/pytorch:22.12-py3. Is this bug really fixed? @Biaocsu were you able to convert from gpt2 to torch_tensorrt?

eric8607242 commented 1 year ago

Hi, I face the same issues when I try to compile the GPT2 into the tensorrt format.

The corresponding environments:

ubuntu 22.04

Driver Version: 515.65.01
CUDA Version: 11.7
GPU: rtx 3090

torch == 1.13.1+cu117
torch_tensorrt == 1.3.0
transformer == 4.25.1

The sample code to produce the error:

tokens=tokenizer('The cat is on the table.', return_tensors='pt')
for k in tokens.keys():
    tokens[k] = tokens[k].cuda()

tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
gpt2_model = GPT2LMHeadModel.from_pretrained('gpt2', return_dict=False, torchscript=True)
gpt2_model.eval()
gpt2_model.cuda()

traced_gpt2_model = torch.jit.trace(gpt2_model, tokens["input_ids"])

trt_gpt2_model = torch_tensorrt.compile(traced_gpt2_model, 
    inputs= [torch_tensorrt.Input(shape=[batch_size, 7], dtype=torch.int32)], # Input ids
    enabled_precisions= {torch.float32}, # Run with 32-bit precision
    workspace_size=2000000000,
    truncate_long_and_double=True
)

Can anyone compile the GPT2 successfully? Big thanks!