Open Matthieu-Tinycoaching opened 1 year ago
@wangyems, could you take a look?
The T-5 optimization is in progress. we will take a look at this model then.
@wangyems Thanks for looking into this. Do you know if there is an estimation for when this feature might be integrated into onnxruntime?
hi @Matthieu-Tinycoaching and @argideritzalpea, we have checked in major part of optimizations for t5 and mt-5. I just found that there's minor graph discrepancies that block the optimization script targeting flan-t5. Fix should be available in a week or so.
you can pull this branch and use convert_to_onnx.py to export flan-t5 graphs.
we also have a one-line command to export flan-t5 with beam search using convert_generation.py. A beam search model with decoder and encoder graph will be generated under /google
python convert_generation.py -m google/flan-t5-large --model_type t5 --output flan-t5-beamsearch.onnx -e --use_gpu
For T5 inference, you'll need to use an ORT nightly build
@wangyems: Thank you! Is it still the case that I need another branch for this? Also, do I still need nightly for inference?
Your branch added some flan models to the list of defaults here: https://github.com/microsoft/onnxruntime/compare/wangye/flan#diff-593772fa94b2b631576ee280ae1c0ee320f532593b9e723c5f3496be6b26647aR28-R37
I don't think they are necessary though, right?
I downloaded nightly it looks like I can run this on flan-t5-base okay:
python -m onnxruntime.transformers.convert_generation -m google/flan-t5-base --model_type t5 -e --use_gpu --output ./output/models/t5/onnx_models/flan_t5_base_beam_search.onnx --output_sequences_scores --num_beams=5 --num_return_sequences=5 --length_penalty=0
However, it doesn't work for flan-t5-small due to this error:
python -m onnxruntime.transformers.convert_generation -m google/flan-t5-small --model_type t5 -e --use_gpu --output ./output/models/t5/onnx_models/flan_t5_small_beam_search.onnx --output_sequences_scores --num_beams=5 --num_return_sequences=5 --length_penalty=0
**** past_present_share_buffer=False
Convert model google/flan-t5-small to onnx ...
Exporting ONNX model to output/models/t5/onnx_models/google/flan-t5-small_encoder_decoder_init.onnx
/opt/conda/envs/jupyter/lib/python3.10/site-packages/transformers/modeling_utils.py:832: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if causal_mask.shape[1] < attention_mask.shape[1]:
============= Diagnostic Run torch.onnx.export version 2.0.1+cu117 =============
verbose: False, log level: Level.ERROR
======================= 0 NONE 0 NOTE 0 WARNING 0 ERROR ========================
Delete the existed onnx file: output/models/t5/onnx_models/google/flan-t5-small_encoder_decoder_init.onnx
Delete the existed external data file: output/models/t5/onnx_models/google/flan-t5-small_encoder_decoder_init.onnx.data
Optimizing model to output/models/t5/onnx_models/google/flan-t5-small_encoder_decoder_init_fp32.onnx
Traceback (most recent call last):
File "/opt/conda/envs/jupyter/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/opt/conda/envs/jupyter/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/opt/conda/envs/jupyter/lib/python3.10/site-packages/onnxruntime/transformers/convert_generation.py", line 2699, in <module>
main()
File "/opt/conda/envs/jupyter/lib/python3.10/site-packages/onnxruntime/transformers/convert_generation.py", line 2681, in main
convert_generation_model(args)
File "/opt/conda/envs/jupyter/lib/python3.10/site-packages/onnxruntime/transformers/convert_generation.py", line 1731, in convert_generation_model
t5_to_onnx(args)
File "/opt/conda/envs/jupyter/lib/python3.10/site-packages/onnxruntime/transformers/convert_generation.py", line 535, in t5_to_onnx
paths = export_t5_onnx_models(
File "/opt/conda/envs/jupyter/lib/python3.10/site-packages/onnxruntime/transformers/models/gpt2/../../models/t5/convert_to_onnx.py", line 198, in export_onnx_models
T5Helper.optimize_onnx(
File "/opt/conda/envs/jupyter/lib/python3.10/site-packages/onnxruntime/transformers/models/t5/t5_helper.py", line 248, in optimize_onnx
m = optimize_model(
File "/opt/conda/envs/jupyter/lib/python3.10/site-packages/onnxruntime/transformers/models/gpt2/../../optimizer.py", line 294, in optimize_model
optimizer = optimize_by_fusion(model, model_type, num_heads, hidden_size, optimization_options)
File "/opt/conda/envs/jupyter/lib/python3.10/site-packages/onnxruntime/transformers/models/gpt2/../../optimizer.py", line 178, in optimize_by_fusion
optimizer = optimizer_class(model, num_heads, hidden_size)
File "/opt/conda/envs/jupyter/lib/python3.10/site-packages/onnxruntime/transformers/models/gpt2/../../onnx_model_t5.py", line 751, in __init__
super().__init__(model, num_heads, hidden_size)
File "/opt/conda/envs/jupyter/lib/python3.10/site-packages/onnxruntime/transformers/models/gpt2/../../onnx_model_bert.py", line 51, in __init__
assert (num_heads == 0 and hidden_size == 0) or (num_heads > 0 and hidden_size % num_heads == 0)
AssertionError
Then I tried again, but this time without nightly - just whatever version I get when I install optimum[onnxruntime-gpu]
, which appears to be onnxruntime-gpu 1.14.1
python -m onnxruntime.transformers.convert_generation -m google/flan-t5-small --model_type t5 -e --use_gpu --output ./output/models/t5/onnx_models/flan_t5_small_beam_search.onnx --output_sequences_scores --num_beams=5 --num_return_sequences=5 --length_penalty=0
**** past_present_share_buffer=False, is_greedysearch=False
Convert model google/flan-t5-small to onnx ...
Exporting ONNX model to output/models/t5/onnx_models/google/flan-t5-small_encoder_decoder_init.onnx
/opt/conda/envs/jupyter/lib/python3.10/site-packages/transformers/modeling_utils.py:832: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if causal_mask.shape[1] < attention_mask.shape[1]:
============= Diagnostic Run torch.onnx.export version 2.0.1+cu117 =============
verbose: False, log level: Level.ERROR
======================= 0 NONE 0 NOTE 0 WARNING 0 ERROR ========================
Delete the existed onnx file: output/models/t5/onnx_models/google/flan-t5-small_encoder_decoder_init.onnx
Delete the existed external data file: output/models/t5/onnx_models/google/flan-t5-small_encoder_decoder_init.onnx.data
batch_size=4 encode_sequence_length=11, max_diff=0.013242721557617188
batch_size=1 encode_sequence_length=2, max_diff=7.152557373046875e-06
batch_size=3 encode_sequence_length=1, max_diff=0.0006928443908691406
batch_size=8 encode_sequence_length=5, max_diff=0.019624948501586914
PyTorch and OnnxRuntime results max difference = 0.019624948501586914
PyTorch and OnnxRuntime results are NOT close
Exporting ONNX model to output/models/t5/onnx_models/google/flan-t5-small_decoder.onnx
/opt/conda/envs/jupyter/lib/python3.10/site-packages/transformers/models/t5/modeling_t5.py:507: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
elif past_key_value.shape[2] != key_value_states.shape[1]:
In-place op on output of tensor.shape. See https://pytorch.org/docs/master/onnx.html#avoid-inplace-operations-when-using-tensor-shape-in-tracing-mode
In-place op on output of tensor.shape. See https://pytorch.org/docs/master/onnx.html#avoid-inplace-operations-when-using-tensor-shape-in-tracing-mode
============= Diagnostic Run torch.onnx.export version 2.0.1+cu117 =============
verbose: False, log level: Level.ERROR
======================= 0 NONE 0 NOTE 0 WARNING 0 ERROR ========================
Delete the existed onnx file: output/models/t5/onnx_models/google/flan-t5-small_decoder.onnx
Delete the existed external data file: output/models/t5/onnx_models/google/flan-t5-small_decoder.onnx.data
batch_size=4, encode_sequence_length=11, past_decode_sequence_length=3, max_diff=0.013082504272460938
batch_size=1, encode_sequence_length=2, past_decode_sequence_length=5, max_diff=4.76837158203125e-06
batch_size=3, encode_sequence_length=1, past_decode_sequence_length=1, max_diff=0.0006706714630126953
batch_size=8, encode_sequence_length=5, past_decode_sequence_length=2, max_diff=0.012537002563476562
PyTorch and OnnxRuntime results max difference = 0.012537002563476562
PyTorch and OnnxRuntime results are NOT close
T5 encoder graph verified: name and data type of inputs and outputs are good.
26 shared initializers (['s_d_decoder.embed_tokens.weight', 's_d_onnx::MatMul_1366', 's_d_onnx::MatMul_1367', 's_d_onnx::MatMul_1368', 's_d_onnx::MatMul_1391', 's_d_onnx::MatMul_1392', 's_d_onnx::MatMul_1393', 's_d_onnx::MatMul_1416', 's_d_onnx::MatMul_1417', 's_d_onnx::MatMul_1418', 's_d_onnx::MatMul_1441', 's_d_onnx::MatMul_1442', 's_d_onnx::MatMul_1443', 's_d_onnx::MatMul_1466', 's_d_onnx::MatMul_1467', 's_d_onnx::MatMul_1468', 's_d_onnx::MatMul_1491', 's_d_onnx::MatMul_1492', 's_d_onnx::MatMul_1493', 's_d_onnx::MatMul_1516', 's_d_onnx::MatMul_1517', 's_d_onnx::MatMul_1518', 's_d_onnx::MatMul_1541', 's_d_onnx::MatMul_1542', 's_d_onnx::MatMul_1543', 's_d_onnx::MatMul_1544']) in encoder and decoder subgraphs are moved to the main graph
Delete the existed onnx file: ./output/models/t5/onnx_models/flan_t5_small_beam_search.onnx
Delete the existed external data file: ./output/models/t5/onnx_models/flan_t5_small_beam_search.onnx.data
model save to ./output/models/t5/onnx_models/flan_t5_small_beam_search.onnx
start testing model...
--------------------------------------------------
Test PyTorch model and beam search with huggingface transformers...
input_ids tensor([[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 13959, 1566, 12, 2379, 10,
37, 556, 19, 1883, 1],
[21603, 10, 585, 3256, 12, 504, 24, 8636, 830, 490,
533, 1393, 12, 70, 2713, 5, 3985, 3, 9, 1782,
300, 54, 991, 12, 1364, 1425, 13, 2189, 21, 321,
3513, 11, 1082, 5, 1]])
huggingface transformers outputs:
sequences tensor([[ 0, 312, 5789, 259, 13833, 1, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[ 0, 312, 5789, 259, 13833, 5, 1, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[ 0, 325, 556, 259, 27549, 721, 5, 1, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[ 0, 325, 556, 259, 13833, 15, 1, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[ 0, 325, 556, 259, 13833, 15, 5, 1, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[ 0, 3, 18536, 3, 9, 1782, 300, 19, 3, 9,
248, 194, 12, 1428, 2189, 5, 1, 0, 0, 0],
[ 0, 3, 18536, 3, 9, 1782, 300, 54, 36, 3,
9, 248, 194, 12, 1428, 2189, 5, 1, 0, 0],
[ 0, 3, 18536, 3, 9, 1782, 300, 54, 36, 3,
9, 248, 194, 12, 4888, 39, 1879, 533, 5, 1],
[ 0, 3, 18536, 3, 9, 1782, 300, 19, 3, 9,
248, 194, 12, 1428, 2189, 11, 2189, 5, 1, 0],
[ 0, 3, 18536, 3, 9, 1782, 300, 54, 36, 3,
9, 248, 194, 12, 4888, 39, 1879, 19016, 5, 1]])
sequences_scores tensor([ -4.6146, -4.8925, -5.4252, -5.4902, -5.7130, -17.0220, -18.1770,
-19.5306, -19.7267, -20.4909])
0: Le produit est publié
1: Le produit est publié.
2: La product est libérée.
3: La product est publiée
4: La product est publiée.
5: Keeping a dog around is a great way to reduce stress.
6: Keeping a dog around can be a great way to reduce stress.
7: Keeping a dog around can be a great way to boost your overall health.
8: Keeping a dog around is a great way to reduce stress and stress.
9: Keeping a dog around can be a great way to boost your overall wellbeing.
--------------------------------------------------
Testing beam search with onnxruntime...
use CUDAExecutionProvider
ORT outputs:
sequences [[[ 0 312 5789 259 13833 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0]
[ 0 312 5789 259 13833 5 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0]
[ 0 312 5789 259 13833 15 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0]
[ 0 312 5789 259 13833 15 5 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0]
[ 0 312 5789 259 13833 3 85 3 40 31 154
9456 257 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0]]
[[ 0 3 18536 3 9 1782 19 3 9 248 194
12 1428 2189 5 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0]
[ 0 3 18536 3 9 1782 19 3 9 248 194
12 1428 2189 21 321 3513 11 1082 5 0 0
0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0]
[ 0 3 18536 3 9 1782 19 3 9 248 194
12 1428 2189 21 321 3513 11 502 5 0 0
0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0]
[ 0 3 18536 3 9 1782 19 3 9 248 194
12 1428 2189 21 321 3513 11 1082 9391 0 0
0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0]
[ 0 3 18536 3 9 1782 19 3 9 248 194
12 1428 2189 21 321 3513 11 1082 9391 5 0
0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0]]]
sequences_scores [[ -47.827347 -57.368763 -57.407936 -66.94655 -124.13513 ]
[-143.51146 -190.83138 -190.88016 -191.01337 -200.34917 ]]
batch 0 sequence 0: Le produit est publié
batch 0 sequence 1: Le produit est publié.
batch 0 sequence 2: Le produit est publiée
batch 0 sequence 3: Le produit est publiée.
batch 0 sequence 4: Le produit est publié à l'élaboration
batch 1 sequence 0: Keeping a dog is a great way to reduce stress.
batch 1 sequence 1: Keeping a dog is a great way to reduce stress for both adults and kids.
batch 1 sequence 2: Keeping a dog is a great way to reduce stress for both adults and children.
batch 1 sequence 3: Keeping a dog is a great way to reduce stress for both adults and kids alike
batch 1 sequence 4: Keeping a dog is a great way to reduce stress for both adults and kids alike.
--------------------------------------------------
Torch Sequences:
tensor([[[ 0, 312, 5789, 259, 13833, 1, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[ 0, 312, 5789, 259, 13833, 5, 1, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[ 0, 325, 556, 259, 27549, 721, 5, 1, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[ 0, 325, 556, 259, 13833, 15, 1, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[ 0, 325, 556, 259, 13833, 15, 5, 1, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0]],
[[ 0, 3, 18536, 3, 9, 1782, 300, 19, 3, 9,
248, 194, 12, 1428, 2189, 5, 1, 0, 0, 0],
[ 0, 3, 18536, 3, 9, 1782, 300, 54, 36, 3,
9, 248, 194, 12, 1428, 2189, 5, 1, 0, 0],
[ 0, 3, 18536, 3, 9, 1782, 300, 54, 36, 3,
9, 248, 194, 12, 4888, 39, 1879, 533, 5, 1],
[ 0, 3, 18536, 3, 9, 1782, 300, 19, 3, 9,
248, 194, 12, 1428, 2189, 11, 2189, 5, 1, 0],
[ 0, 3, 18536, 3, 9, 1782, 300, 54, 36, 3,
9, 248, 194, 12, 4888, 39, 1879, 19016, 5, 1]]])
['Le produit est publié', 'Le produit est publié.', 'La product est libérée.', 'La product est publiée', 'La product est publiée.', 'Keeping a dog around is a great way to reduce stress.', 'Keeping a dog around can be a great way to reduce stress.', 'Keeping a dog around can be a great way to boost your overall health.', 'Keeping a dog around is a great way to reduce stress and stress.', 'Keeping a dog around can be a great way to boost your overall wellbeing.']
--------------------------------------------------
ORT Sequences:
tensor([[[ 0, 312, 5789, 259, 13833, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[ 0, 312, 5789, 259, 13833, 5, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[ 0, 312, 5789, 259, 13833, 15, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[ 0, 312, 5789, 259, 13833, 15, 5, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[ 0, 312, 5789, 259, 13833, 3, 85, 3, 40, 31,
154, 9456, 257, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0]],
[[ 0, 3, 18536, 3, 9, 1782, 19, 3, 9, 248,
194, 12, 1428, 2189, 5, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[ 0, 3, 18536, 3, 9, 1782, 19, 3, 9, 248,
194, 12, 1428, 2189, 21, 321, 3513, 11, 1082, 5,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[ 0, 3, 18536, 3, 9, 1782, 19, 3, 9, 248,
194, 12, 1428, 2189, 21, 321, 3513, 11, 502, 5,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[ 0, 3, 18536, 3, 9, 1782, 19, 3, 9, 248,
194, 12, 1428, 2189, 21, 321, 3513, 11, 1082, 9391,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[ 0, 3, 18536, 3, 9, 1782, 19, 3, 9, 248,
194, 12, 1428, 2189, 21, 321, 3513, 11, 1082, 9391,
5, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0]]])
['Le produit est publié', 'Le produit est publié.', 'Le produit est publiée', 'Le produit est publiée.', "Le produit est publié à l'élaboration", 'Keeping a dog is a great way to reduce stress.', 'Keeping a dog is a great way to reduce stress for both adults and kids.', 'Keeping a dog is a great way to reduce stress for both adults and children.', 'Keeping a dog is a great way to reduce stress for both adults and kids alike', 'Keeping a dog is a great way to reduce stress for both adults and kids alike.']
--------------------------------------------------
Torch and ORT result is different
ORT {'test_times': 1, 'latency_variance': '0.00', 'latency_90_percentile': '282.23', 'latency_95_percentile': '282.23', 'latency_99_percentile': '282.23', 'average_latency_ms': '282.23', 'QPS': '7.09', 'parity': False}
Output files: ./output/models/t5/onnx_models/flan_t5_small_beam_search.onnx, ./output/models/t5/onnx_models/flan_t5_small_beam_search.onnx.data
(jupyter) ubuntu@ip-172-31-3-120:~/sky_workdir$ ls ./output/models/t5/onnx_models/
flan_t5_base_beam_search.onnx flan_t5_base_beam_search.onnx.data flan_t5_small_beam_search.onnx flan_t5_small_beam_search.onnx.data google
(jupyter) ubuntu@ip-172-31-3-120:~/sky_workdir$ rm -rf ./output/models/t5/onnx_models/
(jupyter) ubuntu@ip-172-31-3-120:~/sky_workdir$ python -m onnxruntime.transformers.convert_generation -m google/flan-t5-small --model_type t5 -e --use_gpu --output ./output/models/t5/onnx_models/flan_t5_small_beam_search.onnx --output_sequences_scores --num_beams=5 --num_return_sequences=5 --length_penalty=0
**** past_present_share_buffer=False, is_greedysearch=False
Convert model google/flan-t5-small to onnx ...
Exporting ONNX model to output/models/t5/onnx_models/google/flan-t5-small_encoder_decoder_init.onnx
/opt/conda/envs/jupyter/lib/python3.10/site-packages/transformers/modeling_utils.py:832: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if causal_mask.shape[1] < attention_mask.shape[1]:
============= Diagnostic Run torch.onnx.export version 2.0.1+cu117 =============
verbose: False, log level: Level.ERROR
======================= 0 NONE 0 NOTE 0 WARNING 0 ERROR ========================
batch_size=4 encode_sequence_length=11, max_diff=0.015545368194580078
batch_size=1 encode_sequence_length=2, max_diff=4.291534423828125e-06
batch_size=3 encode_sequence_length=1, max_diff=0.0006568431854248047
batch_size=8 encode_sequence_length=5, max_diff=0.01646888256072998
PyTorch and OnnxRuntime results max difference = 0.01646888256072998
PyTorch and OnnxRuntime results are NOT close
Exporting ONNX model to output/models/t5/onnx_models/google/flan-t5-small_decoder.onnx
/opt/conda/envs/jupyter/lib/python3.10/site-packages/transformers/models/t5/modeling_t5.py:507: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
elif past_key_value.shape[2] != key_value_states.shape[1]:
In-place op on output of tensor.shape. See https://pytorch.org/docs/master/onnx.html#avoid-inplace-operations-when-using-tensor-shape-in-tracing-mode
In-place op on output of tensor.shape. See https://pytorch.org/docs/master/onnx.html#avoid-inplace-operations-when-using-tensor-shape-in-tracing-mode
============= Diagnostic Run torch.onnx.export version 2.0.1+cu117 =============
verbose: False, log level: Level.ERROR
======================= 0 NONE 0 NOTE 0 WARNING 0 ERROR ========================
batch_size=4, encode_sequence_length=11, past_decode_sequence_length=3, max_diff=0.008085250854492188
batch_size=1, encode_sequence_length=2, past_decode_sequence_length=5, max_diff=7.62939453125e-06
batch_size=3, encode_sequence_length=1, past_decode_sequence_length=1, max_diff=0.00025200843811035156
batch_size=8, encode_sequence_length=5, past_decode_sequence_length=2, max_diff=0.01223444938659668
PyTorch and OnnxRuntime results max difference = 0.01223444938659668
PyTorch and OnnxRuntime results are NOT close
T5 encoder graph verified: name and data type of inputs and outputs are good.
26 shared initializers (['s_d_decoder.embed_tokens.weight', 's_d_onnx::MatMul_1366', 's_d_onnx::MatMul_1367', 's_d_onnx::MatMul_1368', 's_d_onnx::MatMul_1391', 's_d_onnx::MatMul_1392', 's_d_onnx::MatMul_1393', 's_d_onnx::MatMul_1416', 's_d_onnx::MatMul_1417', 's_d_onnx::MatMul_1418', 's_d_onnx::MatMul_1441', 's_d_onnx::MatMul_1442', 's_d_onnx::MatMul_1443', 's_d_onnx::MatMul_1466', 's_d_onnx::MatMul_1467', 's_d_onnx::MatMul_1468', 's_d_onnx::MatMul_1491', 's_d_onnx::MatMul_1492', 's_d_onnx::MatMul_1493', 's_d_onnx::MatMul_1516', 's_d_onnx::MatMul_1517', 's_d_onnx::MatMul_1518', 's_d_onnx::MatMul_1541', 's_d_onnx::MatMul_1542', 's_d_onnx::MatMul_1543', 's_d_onnx::MatMul_1544']) in encoder and decoder subgraphs are moved to the main graph
model save to ./output/models/t5/onnx_models/flan_t5_small_beam_search.onnx
start testing model...
--------------------------------------------------
Test PyTorch model and beam search with huggingface transformers...
input_ids tensor([[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 13959, 1566, 12, 2379, 10,
37, 556, 19, 1883, 1],
[21603, 10, 585, 3256, 12, 504, 24, 8636, 830, 490,
533, 1393, 12, 70, 2713, 5, 3985, 3, 9, 1782,
300, 54, 991, 12, 1364, 1425, 13, 2189, 21, 321,
3513, 11, 1082, 5, 1]])
huggingface transformers outputs:
sequences tensor([[ 0, 312, 5789, 259, 13833, 1, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[ 0, 312, 5789, 259, 13833, 5, 1, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[ 0, 325, 556, 259, 27549, 721, 5, 1, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[ 0, 325, 556, 259, 13833, 15, 1, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[ 0, 325, 556, 259, 13833, 15, 5, 1, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[ 0, 3, 18536, 3, 9, 1782, 300, 19, 3, 9,
248, 194, 12, 1428, 2189, 5, 1, 0, 0, 0],
[ 0, 3, 18536, 3, 9, 1782, 300, 54, 36, 3,
9, 248, 194, 12, 1428, 2189, 5, 1, 0, 0],
[ 0, 3, 18536, 3, 9, 1782, 300, 54, 36, 3,
9, 248, 194, 12, 4888, 39, 1879, 533, 5, 1],
[ 0, 3, 18536, 3, 9, 1782, 300, 19, 3, 9,
248, 194, 12, 1428, 2189, 11, 2189, 5, 1, 0],
[ 0, 3, 18536, 3, 9, 1782, 300, 54, 36, 3,
9, 248, 194, 12, 4888, 39, 1879, 19016, 5, 1]])
sequences_scores tensor([ -4.6146, -4.8925, -5.4252, -5.4902, -5.7130, -17.0220, -18.1770,
-19.5306, -19.7267, -20.4909])
0: Le produit est publié
1: Le produit est publié.
2: La product est libérée.
3: La product est publiée
4: La product est publiée.
5: Keeping a dog around is a great way to reduce stress.
6: Keeping a dog around can be a great way to reduce stress.
7: Keeping a dog around can be a great way to boost your overall health.
8: Keeping a dog around is a great way to reduce stress and stress.
9: Keeping a dog around can be a great way to boost your overall wellbeing.
--------------------------------------------------
Testing beam search with onnxruntime...
use CUDAExecutionProvider
ORT outputs:
sequences [[[ 0 312 5789 259 13833 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0]
[ 0 312 5789 259 13833 5 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0]
[ 0 312 5789 259 13833 15 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0]
[ 0 312 5789 259 13833 15 5 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0]
[ 0 312 5789 259 13833 3 85 3 40 31 154
9456 257 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0]]
[[ 0 3 18536 3 9 1782 19 3 9 248 194
12 1428 2189 5 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0]
[ 0 3 18536 3 9 1782 19 3 9 248 194
12 1428 2189 21 321 3513 11 1082 5 0 0
0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0]
[ 0 3 18536 3 9 1782 19 3 9 248 194
12 1428 2189 21 321 3513 11 502 5 0 0
0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0]
[ 0 3 18536 3 9 1782 19 3 9 248 194
12 1428 2189 21 321 3513 11 1082 9391 0 0
0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0]
[ 0 3 18536 3 9 1782 19 3 9 248 194
12 1428 2189 21 321 3513 11 1082 9391 5 0
0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0]]]
sequences_scores [[ -47.827347 -57.368763 -57.407936 -66.94655 -124.13513 ]
[-143.51146 -190.83138 -190.88016 -191.01337 -200.34917 ]]
batch 0 sequence 0: Le produit est publié
batch 0 sequence 1: Le produit est publié.
batch 0 sequence 2: Le produit est publiée
batch 0 sequence 3: Le produit est publiée.
batch 0 sequence 4: Le produit est publié à l'élaboration
batch 1 sequence 0: Keeping a dog is a great way to reduce stress.
batch 1 sequence 1: Keeping a dog is a great way to reduce stress for both adults and kids.
batch 1 sequence 2: Keeping a dog is a great way to reduce stress for both adults and children.
batch 1 sequence 3: Keeping a dog is a great way to reduce stress for both adults and kids alike
batch 1 sequence 4: Keeping a dog is a great way to reduce stress for both adults and kids alike.
--------------------------------------------------
Torch Sequences:
tensor([[[ 0, 312, 5789, 259, 13833, 1, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[ 0, 312, 5789, 259, 13833, 5, 1, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[ 0, 325, 556, 259, 27549, 721, 5, 1, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[ 0, 325, 556, 259, 13833, 15, 1, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[ 0, 325, 556, 259, 13833, 15, 5, 1, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0]],
[[ 0, 3, 18536, 3, 9, 1782, 300, 19, 3, 9,
248, 194, 12, 1428, 2189, 5, 1, 0, 0, 0],
[ 0, 3, 18536, 3, 9, 1782, 300, 54, 36, 3,
9, 248, 194, 12, 1428, 2189, 5, 1, 0, 0],
[ 0, 3, 18536, 3, 9, 1782, 300, 54, 36, 3,
9, 248, 194, 12, 4888, 39, 1879, 533, 5, 1],
[ 0, 3, 18536, 3, 9, 1782, 300, 19, 3, 9,
248, 194, 12, 1428, 2189, 11, 2189, 5, 1, 0],
[ 0, 3, 18536, 3, 9, 1782, 300, 54, 36, 3,
9, 248, 194, 12, 4888, 39, 1879, 19016, 5, 1]]])
['Le produit est publié', 'Le produit est publié.', 'La product est libérée.', 'La product est publiée', 'La product est publiée.', 'Keeping a dog around is a great way to reduce stress.', 'Keeping a dog around can be a great way to reduce stress.', 'Keeping a dog around can be a great way to boost your overall health.', 'Keeping a dog around is a great way to reduce stress and stress.', 'Keeping a dog around can be a great way to boost your overall wellbeing.']
--------------------------------------------------
ORT Sequences:
tensor([[[ 0, 312, 5789, 259, 13833, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[ 0, 312, 5789, 259, 13833, 5, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[ 0, 312, 5789, 259, 13833, 15, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[ 0, 312, 5789, 259, 13833, 15, 5, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[ 0, 312, 5789, 259, 13833, 3, 85, 3, 40, 31,
154, 9456, 257, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0]],
[[ 0, 3, 18536, 3, 9, 1782, 19, 3, 9, 248,
194, 12, 1428, 2189, 5, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[ 0, 3, 18536, 3, 9, 1782, 19, 3, 9, 248,
194, 12, 1428, 2189, 21, 321, 3513, 11, 1082, 5,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[ 0, 3, 18536, 3, 9, 1782, 19, 3, 9, 248,
194, 12, 1428, 2189, 21, 321, 3513, 11, 502, 5,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[ 0, 3, 18536, 3, 9, 1782, 19, 3, 9, 248,
194, 12, 1428, 2189, 21, 321, 3513, 11, 1082, 9391,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[ 0, 3, 18536, 3, 9, 1782, 19, 3, 9, 248,
194, 12, 1428, 2189, 21, 321, 3513, 11, 1082, 9391,
5, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0]]])
['Le produit est publié', 'Le produit est publié.', 'Le produit est publiée', 'Le produit est publiée.', "Le produit est publié à l'élaboration", 'Keeping a dog is a great way to reduce stress.', 'Keeping a dog is a great way to reduce stress for both adults and kids.', 'Keeping a dog is a great way to reduce stress for both adults and children.', 'Keeping a dog is a great way to reduce stress for both adults and kids alike', 'Keeping a dog is a great way to reduce stress for both adults and kids alike.']
--------------------------------------------------
Torch and ORT result is different
ORT {'test_times': 1, 'latency_variance': '0.00', 'latency_90_percentile': '248.05', 'latency_95_percentile': '248.05', 'latency_99_percentile': '248.05', 'average_latency_ms': '248.05', 'QPS': '8.06', 'parity': False}
Output files: ./output/models/t5/onnx_models/flan_t5_small_beam_search.onnx, ./output/models/t5/onnx_models/flan_t5_small_beam_search.onnx.data
I presume that the Torch and ORT result being different is a legitimate error?
Describe the feature request
Would it be possible to add GPU graph optimizations for Flan-T5-Large model?
Describe scenario use case
Actually, after having exported the model to ONNX and trying to optimize it with
ORTOptimizer
as below:I got the following error message: