microsoft / onnxruntime

ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator
https://onnxruntime.ai
MIT License
14.77k stars 2.94k forks source link

T5-Small different output for decoder inference with CPU and DirectML EPs #22896

Open r4ghu opened 2 days ago

r4ghu commented 2 days ago

Describe the issue

Hi team, I am currently running T5-Small model inference using OnnxRuntime. The model I am using to run the inference is - https://huggingface.co/Xenova/t5-small/tree/main/onnx

I tested the same model on CPU and DirectML execution providers and observed different outputs for the same input during the decoding stage.

I am attaching some results for CPU and DirectML comparisons for reference -

=== Comparing Encoder Outputs ===

Comparing Encoder outputs:
Shapes: (1, 12, 512) vs (1, 12, 512)

Statistics for first array:
  mean: -0.002746098442003131
  std: 0.12785771489143372
  min: -0.5774061679840088
  max: 0.5452761054039001
  abs_max: 0.5774061679840088
  has_nan: False
  has_inf: False

Statistics for second array:
  mean: -0.00274610030464828
  std: 0.1278577446937561
  min: -0.5774062871932983
  max: 0.5452762246131897
  abs_max: 0.5774062871932983
  has_nan: False
  has_inf: False

Difference analysis:
  Maximum absolute difference: 5.736947059631348e-07
  Mean absolute difference: 5.666575475515856e-08
  Maximum relative difference: 0.07109003514051437
  Position of max difference: (np.int64(0), np.int64(1), np.int64(401))
✅ Differences within acceptable threshold (1e-05)

=== Comparing Decoder Outputs ===

Comparing Decoder logits:
Shapes: (1, 1, 32128) vs (1, 1, 32128)

Statistics for first array:
  mean: -19.10366439819336
  std: 4.460851669311523
  min: -43.21986389160156
  max: -1.202622890472412
  abs_max: 43.21986389160156
  has_nan: False
  has_inf: False

Statistics for second array:
  mean: -19.10366439819336
  std: 4.460851669311523
  min: -43.21989059448242
  max: -1.2026221752166748
  abs_max: 43.21989059448242
  has_nan: False
  has_inf: False

Difference analysis:
  Maximum absolute difference: 5.7220458984375e-05
  Mean absolute difference: 7.175476639531553e-06
  Maximum relative difference: 2.00232352653984e-06
  Position of max difference: (np.int64(0), np.int64(0), np.int64(32113))
❌ Large difference detected! (> 1e-05)

Values at maximum difference point:
  Array1: -43.13878631591797
  Array2: -43.13884353637695

Surrounding values (if available):
  Array1 at [np.int64(0), np.int64(0), np.int64(32112)]: -43.058406829833984
  Array2 at [np.int64(0), np.int64(0), np.int64(32112)]: -43.058406829833984
  Array1 at [np.int64(0), np.int64(0), np.int64(32114)]: -43.1171760559082
  Array2 at [np.int64(0), np.int64(0), np.int64(32114)]: -43.11715316772461

To reproduce

Please run the above mentioned model using encode and decode methods.

Urgency

I would like to get this resolved by end of Dec 2024.

Platform

Windows

OS Version

Windows 11 Enterprise 22631.4169

ONNX Runtime Installation

Released Package

ONNX Runtime Version or Commit ID

1.20.0

ONNX Runtime API

Python

Architecture

X64

Execution Provider

DirectML

Execution Provider Library Version

DirectML 1.15.4

tianleiwu commented 1 day ago

@r4ghu,

5.7220458984375e-05 does not seems a large difference for a model. Could you use end-to-end metrics (like precision/recall etc) to measure and see whether it makes any difference between CPU and DirectML?