Open oborchers opened 2 years ago
I am tagging @kevinch-nv and @yuanyao-nv because of their excellent help last time 🚀
This is a unfortunately a known limitation in the Einsum Layer in TRT - we only support floating-point types for Einsum equations.
Do you know which operation this Einsum equation is implying? Perhaps we can try and substitute the Einsum.
@kevinch-nv: Thanks for the reply! I've been looking at the source code and afaict the only einsum
op in the model is actually a float one:
def fixed_pos_embedding(x, seq_dim=1, seq_len=None):
dim = x.shape[-1]
if seq_len is None:
seq_len = x.shape[seq_dim]
inv_freq = 1.0 / (10000 ** (torch.arange(0, dim, 2) / dim))
sinusoid_inp = (
torch.einsum("i , j -> i j", torch.arange(seq_len, dtype=torch.float), inv_freq).to(x.device).float()
)
return torch.sin(sinusoid_inp), torch.cos(sinusoid_inp)
Am I missing something obvious?
It's possible that one of the inputs are being interpreted incorrectly as INT32. Can you provide the converted .onnx model?
@oborchers As a (slightly hacky) workaround, since fixed_pos_embedding
does not depend on the input you can just precompute it for each dim
you need and a maximum sequence length, and then use something like this:
def fixed_pos_embedding(x, seq_dim=1, seq_len=None):
dim = x.shape[-1]
if seq_len is None:
seq_len = x.shape[seq_dim]
s = torch.load(f'sin_pos_{dim}.pt').to(x.device)
c = torch.load(f'cos_pos_{dim}.pt').to(x.device)
# Truncate to seq_len
s = s[:seq_len]
c = c[:seq_len]
return s, c
@oborchers Did you solve this issue? I got the same problem on the Salesforce/codegen-16b. They use the fixed_pos_embedding too. However, my colleagues could run the tensorrt engine successfully on the codege-350m.
Description
This issue is a followup on #818 which I also created. I am working on the transformers deploy repository and created a PR that enables support for exporting larger transformers models to TensorRT.
This works well with
gpt2-medium
,gpt-neo-1.3b
, andgpt-neo-2.7b
. However, for GPT-J I am running into the following issue:As all other models seemingly work well, I assume this might be directly related to TRT?
Environment
TensorRT Version: 8.2.2-1 ONNX-TensorRT Version / Branch: 8.2.2.1 GPU Type:� V100 Nvidia Driver Version: 495.29.05 CUDA Version: 11.5 CUDNN Version: Operating System + Version: Ubuntu 20.04.3 LTS Python Version (if applicable): 3.8.10 TensorFlow + TF2ONNX Version (if applicable): NA PyTorch Version (if applicable): 1.10.2+cu113 Baremetal or Container (if container which image + tag): See below
Steps To Reproduce
docker build -t tfdeploy .