pytorch / TensorRT

PyTorch/TorchScript/FX compiler for NVIDIA GPUs using TensorRT
https://pytorch.org/TensorRT
BSD 3-Clause "New" or "Revised" License
2.57k stars 350 forks source link

Large output differences with facebook/bart-base model #3252

Open chohk88 opened 5 hours ago

chohk88 commented 5 hours ago

Significant output differences when compiling and running the facebook/bart-base (https://huggingface.co/facebook/bart-base) model with Torch-TensorRT, even after applying FP16 and various precision settings.

Compare the output using the following code:

import torch
from transformers import BartTokenizer, BartModel
import torch_tensorrt

# Set device and backend
backend = "torch_tensorrt"
device = "cuda:0"

# Load tokenizer and model
tokenizer = BartTokenizer.from_pretrained('facebook/bart-base')
model = BartModel.from_pretrained('facebook/bart-base')
model.eval()
model = model.to(device)

# Prepare input
inputs = tokenizer("Hello, my dog is cute", return_tensors="pt")
inputs = {k: v.to(device) for k, v in inputs.items()} 

# Run inference before Torch-TensorRT
outputs_before = model(**inputs)

# Apply Torch-TensorRT optimization
model = torch.compile(
    model,
    backend=backend,
    options={
        "truncate_long_and_double": True,
        "enabled_precisions": {torch.float16, torch.float32},
    },
    dynamic=False,
)

# Run inference after Torch-TensorRT
outputs_after = model(**inputs)

# Compare outputs
last_hidden_states_before = outputs_before.last_hidden_state
last_hidden_states_after = outputs_after.last_hidden_state

# Calculate the maximum absolute difference
max_diff = torch.max(torch.abs(last_hidden_states_before - last_hidden_states_after)).item()

# Calculate the mean absolute difference
mean_abs_diff = torch.mean(torch.abs(last_hidden_states_before - last_hidden_states_after)).item()

# Calculate the plain mean of the differences (not absolute)
mean_diff = torch.mean(last_hidden_states_before - last_hidden_states_after).item()

# Print the outputs, max difference, mean absolute difference, and plain mean difference
print("Outputs before Torch-TensorRT:")
print(last_hidden_states_before)
print("\nOutputs after Torch-TensorRT:")
print(last_hidden_states_after)

print(f"\nMaximum absolute difference: {max_diff}")
print(f"Mean absolute difference: {mean_abs_diff}")
print(f"Mean difference: {mean_diff}")

Here are the differences I'm seeing:

These values are much larger than expected.

Additional Tests

  1. I tried compiling the model with FP16 precision enabled using the following code, but the output differences remain significant:

    model = BartModel.from_pretrained('facebook/bart-base', torch_dtype=torch.float16)
  2. I also enabled "use_fp32_acc" and "use_explicit_typing", but the differences persisted:

    model = torch.compile(
        model,
        backend="torch_tensorrt",
        options={
            "truncate_long_and_double": True,
            "enabled_precisions": {torch.float16, torch.float32},
            "use_fp32_acc": True,
            "use_explicit_typing": True,
        },
        dynamic=False,
    )
chohk88 commented 5 hours ago

@peri044 I tried running the additional tests you suggested, but I’m still seeing large differences in the output, as mentioned above. I would really appreciate it if you could share advice on this issue.