[Performance] MT5 model float16 parity check failed

trajepl commented 9 months ago

Describe the issue

python -m onnxruntime.transformers.convert_generation -m google/mt5-base --model_type mt5 \
    --output /home/jiapli/workspace/olive/examples/t5/onnx_models/google/mt5_beam_search.onnx \
    --use_gpu --past_present_share_buffer --use_decoder_masked_attention -e -p fp16

Parity check failed when run optimization with above cmd.

batch_size=4, encode_sequence_length=11, past_decode_sequence_length=3, max_diff=1.1168098 batch_size=1, encode_sequence_length=2, past_decode_sequence_length=5, max_diff=0.71538544 batch_size=3, encode_sequence_length=1, past_decode_sequence_length=1, max_diff=3.5321946 batch_size=8, encode_sequence_length=5, past_decode_sequence_length=2, max_diff=11.911711 PyTorch and OnnxRuntime results max difference = 11.911710739135742 PyTorch and OnnxRuntime results are NOT close

To reproduce

See the description of this issue.

Urgency

No response

Platform

Linux

OS Version

Ubuntu 20.04.5 LTS

ONNX Runtime Installation

Released Package

ONNX Runtime Version or Commit ID

1.16.2

ONNX Runtime API

Python

Architecture

X64

Execution Provider

CUDA

Execution Provider Library Version

CUDA

Model File

No response

Is this a quantized model?

No

github-actions[bot] commented 8 months ago

This issue has been automatically marked as stale due to inactivity and will be closed in 30 days if no further activity occurs. If further support is needed, please provide an update and/or more details.

trajepl commented 8 months ago

Any ideas?

microsoft / onnxruntime