Closed Biaocsu closed 2 years ago
Is there anyone can help me ? many thanks
Response from the other thread since I am going to close it in favor of this one:
First thing I would try is to make sure your model is on GPU before compiling. These errors about op support may be misleading if partial compilation is enabled (which it is by default). But the Runtime error from PyTorch is caused by tensors not being on the same device. I think from there, you should be able to run, if not, we have dedicated support for some cases of aten::Int coming, as well as we can look at adding support for where.self
Response from the other thread since I am going to close it in favor of this one:
First thing I would try is to make sure your model is on GPU before compiling. These errors about op support may be misleading if partial compilation is enabled (which it is by default). But the Runtime error from PyTorch is caused by tensors not being on the same device. I think from there, you should be able to run, if not, we have dedicated support for some cases of aten::Int coming, as well as we can look at adding support for where.self
I make sure to put model
on GPU and tokens
as well, it still comes the same error, please help and check the demo below, many thanks!
import torch
from transformers import GPT2Tokenizer, GPT2LMHeadModel
import torch_tensorrt
tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
model = GPT2LMHeadModel.from_pretrained('gpt2', return_dict=False).cuda()
model.eval()
tokens=tokenizer('The cat is on the table.', return_tensors='pt')['input_ids'].cuda()
traced_model = torch.jit.trace(model, tokens)
compile_settings = {
"inputs": [torch_tensorrt.Input(
# min_shape=[1, 3, 224, 224],
# opt_shape=[1, 3, 512, 512],
# max_shape=[1, 3, 1024, 1024],
# For static size
shape=[1, 7],
dtype=torch.int32,# Datatype of input tensor. Allowed options torch.(float|half|int8|int32|bool)
)],
"truncate_long_and_double": True,
"enabled_precisions": {torch.half} # Run with FP16
}
trt_model = torch_tensorrt.compile(traced_model, **compile_settings)
@Biaocsu I tried your setup and got the error:
RuntimeError Traceback (most recent call last)
Input In [36], in <cell line: 1>()
----> 1 trt_ts_module = torch_tensorrt.compile(
2 traced_model,
3 inputs=[torch_tensorrt.Input(
4 # min_shape=[1, 1],
5 # opt_shape=[1, 32],
6 # max_shape=[1, 64],
7 shape=[1, 7],
8 dtype=torch.int32,
9 )],
10 enabled_precisions={torch.half},
11 truncate_long_and_double=True,
12 torch_executed_ops=['aten::where'],
13 require_full_compilation=False
14 )
File /opt/conda/lib/python3.8/site-packages/torch_tensorrt/_compile.py:115, in compile(module, ir, inputs, enabled_precisions, **kwargs)
110 logging.log(
111 logging.Level.Info,
112 "Module was provided as a torch.nn.Module, trying to script the module with torch.jit.script. In the event of a failure please preconvert your module to TorchScript"
113 )
114 ts_mod = torch.jit.script(module)
--> 115 return torch_tensorrt.ts.compile(ts_mod, inputs=inputs, enabled_precisions=enabled_precisions, **kwargs)
116 elif target_ir == _IRType.fx:
117 raise RuntimeError("fx is currently not supported")
File /opt/conda/lib/python3.8/site-packages/torch_tensorrt/ts/_compiler.py:116, in compile(module, inputs, device, disable_tf32, sparse_weights, enabled_precisions, refit, debug, strict_types, capability, num_min_timing_iters, num_avg_timing_iters, workspace_size, calibrator, truncate_long_and_double, require_full_compilation, min_block_size, torch_executed_ops, torch_executed_modules)
89 raise ValueError(
90 "require_full_compilation is enabled however the list of modules and ops to run in torch is not empty. Found: torch_executed_ops: "
91 + torch_executed_ops + ", torch_executed_modules: " + torch_executed_modules)
93 spec = {
94 "inputs": inputs,
95 "device": device,
(...)
113 }
114 }
--> 116 compiled_cpp_mod = _C.compile_graph(module._c, _parse_compile_spec(spec))
117 compiled_module = torch.jit._recursive.wrap_cpp_module(compiled_cpp_mod)
118 return compiled_module
RuntimeError: [Error thrown at core/partitioning/shape_analysis.cpp:127] Unsupported input data type unsigned char
Seems like HF implementation uses byte datatype, which may not be supported?
@calclavia not sure.
I tried pytorch -> onnx -> tensorrt
method , and find tensorrt
do not support NonZero
operation
anyway, I do not success convert gpt2
by torch_tensorrt
I encountered the same error while testing nvcr.io/nvidia/pytorch:22.12-py3
. Is this bug really fixed? @Biaocsu were you able to convert from gpt2 to torch_tensorrt?
Hi, I face the same issues when I try to compile the GPT2 into the tensorrt format.
The corresponding environments:
ubuntu 22.04
Driver Version: 515.65.01
CUDA Version: 11.7
GPU: rtx 3090
torch == 1.13.1+cu117
torch_tensorrt == 1.3.0
transformer == 4.25.1
The sample code to produce the error:
tokens=tokenizer('The cat is on the table.', return_tensors='pt')
for k in tokens.keys():
tokens[k] = tokens[k].cuda()
tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
gpt2_model = GPT2LMHeadModel.from_pretrained('gpt2', return_dict=False, torchscript=True)
gpt2_model.eval()
gpt2_model.cuda()
traced_gpt2_model = torch.jit.trace(gpt2_model, tokens["input_ids"])
trt_gpt2_model = torch_tensorrt.compile(traced_gpt2_model,
inputs= [torch_tensorrt.Input(shape=[batch_size, 7], dtype=torch.int32)], # Input ids
enabled_precisions= {torch.float32}, # Run with 32-bit precision
workspace_size=2000000000,
truncate_long_and_double=True
)
Can anyone compile the GPT2 successfully? Big thanks!
Bug Description
To Reproduce
Expected behavior
No error
Environment
docker build --build-arg BASE=21.10 -f docker/Dockerfile -t torch_tensorrt:latest .
conda
,pip
,libtorch
, source):so do it really means
torch_tensorrt
not supportgpt2
or I did something wrong ?