Significant output differences when compiling and running the facebook/bart-base (https://huggingface.co/facebook/bart-base) model with Torch-TensorRT, even after applying FP16 and various precision settings.
Compare the output using the following code:
import torch
from transformers import BartTokenizer, BartModel
import torch_tensorrt
# Set device and backend
backend = "torch_tensorrt"
device = "cuda:0"
# Load tokenizer and model
tokenizer = BartTokenizer.from_pretrained('facebook/bart-base')
model = BartModel.from_pretrained('facebook/bart-base')
model.eval()
model = model.to(device)
# Prepare input
inputs = tokenizer("Hello, my dog is cute", return_tensors="pt")
inputs = {k: v.to(device) for k, v in inputs.items()}
# Run inference before Torch-TensorRT
outputs_before = model(**inputs)
# Apply Torch-TensorRT optimization
model = torch.compile(
model,
backend=backend,
options={
"truncate_long_and_double": True,
"enabled_precisions": {torch.float16, torch.float32},
},
dynamic=False,
)
# Run inference after Torch-TensorRT
outputs_after = model(**inputs)
# Compare outputs
last_hidden_states_before = outputs_before.last_hidden_state
last_hidden_states_after = outputs_after.last_hidden_state
# Calculate the maximum absolute difference
max_diff = torch.max(torch.abs(last_hidden_states_before - last_hidden_states_after)).item()
# Calculate the mean absolute difference
mean_abs_diff = torch.mean(torch.abs(last_hidden_states_before - last_hidden_states_after)).item()
# Calculate the plain mean of the differences (not absolute)
mean_diff = torch.mean(last_hidden_states_before - last_hidden_states_after).item()
# Print the outputs, max difference, mean absolute difference, and plain mean difference
print("Outputs before Torch-TensorRT:")
print(last_hidden_states_before)
print("\nOutputs after Torch-TensorRT:")
print(last_hidden_states_after)
print(f"\nMaximum absolute difference: {max_diff}")
print(f"Mean absolute difference: {mean_abs_diff}")
print(f"Mean difference: {mean_diff}")
Here are the differences I'm seeing:
Maximum absolute difference: 6.1822
Mean absolute difference: 0.8487
Mean difference: -0.0164
These values are much larger than expected.
Additional Tests
I tried compiling the model with FP16 precision enabled using the following code, but the output differences remain significant:
model = BartModel.from_pretrained('facebook/bart-base', torch_dtype=torch.float16)
I also enabled "use_fp32_acc" and "use_explicit_typing", but the differences persisted:
@peri044 I tried running the additional tests you suggested, but I’m still seeing large differences in the output, as mentioned above. I would really appreciate it if you could share advice on this issue.
Significant output differences when compiling and running the
facebook/bart-base
(https://huggingface.co/facebook/bart-base) model with Torch-TensorRT, even after applying FP16 and various precision settings.Compare the output using the following code:
Here are the differences I'm seeing:
These values are much larger than expected.
Additional Tests
I tried compiling the model with FP16 precision enabled using the following code, but the output differences remain significant:
I also enabled
"use_fp32_acc"
and"use_explicit_typing"
, but the differences persisted: