Large output differences with facebook/bart-base model

Significant output differences when compiling and running the facebook/bart-base (https://huggingface.co/facebook/bart-base) model with Torch-TensorRT, even after applying FP16 and various precision settings.

Compare the output using the following code:

import torch
from transformers import BartTokenizer, BartModel
import torch_tensorrt

# Set device and backend
backend = "torch_tensorrt"
device = "cuda:0"

# Load tokenizer and model
tokenizer = BartTokenizer.from_pretrained('facebook/bart-base')
model = BartModel.from_pretrained('facebook/bart-base')
model.eval()
model = model.to(device)

# Prepare input
inputs = tokenizer("Hello, my dog is cute", return_tensors="pt")
inputs = {k: v.to(device) for k, v in inputs.items()} 

# Run inference before Torch-TensorRT
outputs_before = model(**inputs)

# Apply Torch-TensorRT optimization
model = torch.compile(
    model,
    backend=backend,
    options={
        "truncate_long_and_double": True,
        "enabled_precisions": {torch.float16, torch.float32},
    },
    dynamic=False,
)

# Run inference after Torch-TensorRT
outputs_after = model(**inputs)

# Compare outputs
last_hidden_states_before = outputs_before.last_hidden_state
last_hidden_states_after = outputs_after.last_hidden_state

# Calculate the maximum absolute difference
max_diff = torch.max(torch.abs(last_hidden_states_before - last_hidden_states_after)).item()

# Calculate the mean absolute difference
mean_abs_diff = torch.mean(torch.abs(last_hidden_states_before - last_hidden_states_after)).item()

# Calculate the plain mean of the differences (not absolute)
mean_diff = torch.mean(last_hidden_states_before - last_hidden_states_after).item()

# Print the outputs, max difference, mean absolute difference, and plain mean difference
print("Outputs before Torch-TensorRT:")
print(last_hidden_states_before)
print("\nOutputs after Torch-TensorRT:")
print(last_hidden_states_after)

print(f"\nMaximum absolute difference: {max_diff}")
print(f"Mean absolute difference: {mean_abs_diff}")
print(f"Mean difference: {mean_diff}")

Here are the differences I'm seeing:

Maximum absolute difference: 6.1822
Mean absolute difference: 0.8487
Mean difference: -0.0164

These values are much larger than expected.

Additional Tests

I tried compiling the model with FP16 precision enabled using the following code, but the output differences remain significant:
```
model = BartModel.from_pretrained('facebook/bart-base', torch_dtype=torch.float16)
```

I also enabled "use_fp32_acc" and "use_explicit_typing", but the differences persisted:

model = torch.compile(
    model,
    backend="torch_tensorrt",
    options={
        "truncate_long_and_double": True,
        "enabled_precisions": {torch.float16, torch.float32},
        "use_fp32_acc": True,
        "use_explicit_typing": True,
    },
    dynamic=False,
)

pytorch / TensorRT

Large output differences with facebook/bart-base model #3252

Additional Tests