Optimization for T5 transformer models.

microsoft / onnxruntime

ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator

https://onnxruntime.ai

MIT License

14.65k stars 2.93k forks source link

Optimization for T5 transformer models. #10613

Open VikasOjha666 opened 2 years ago

VikasOjha666 commented 2 years ago

Is your feature request related to a problem? Please describe. No, it's not a problem but a feature request

System information

ONNX Runtime version (you are using): 1.9

Describe the solution you'd like As far as now the layer fusion-based optimization is available for BERT, GPT2, BART, etc. But it's not available for T5. Hence it would be good if it was implemented for T5 as well because the T5 model is getting quite popular.

Describe alternatives you've considered A clear and concise description of any alternative solutions or features you've considered.

Additional context Add any other context or screenshots about the feature request here.

wangyems commented 2 years ago

Will add to our backlog.

ierezell commented 2 years ago

Hello,

After using hugging face optimum, I found that it will soon be possible to do some seq2seq (https://github.com/huggingface/optimum/pull/199). It works great without optimization, but to optimize T5 models we would need an onnxruntime/transformers/onnx_model_XXX.py which is missing the T5 one.

(More details on their forum: https://discuss.huggingface.co/t/optimum-t5-for-inference/16695/5)

Do you have any status about that? I would be able to spend some time on that if needed.

Thanks in advance, Have a great day

tianleiwu commented 2 years ago

@Ierezell, optimization of T5 model are planned (likely in 1.13 release). Contributions are welcome.

p-christ commented 1 year ago

did this happen? i'm still seeing this message

KeyError: "ONNX Runtime doesn't support the graph optimization of t5 yet. Only ['bert', 'gpt2', 'bart'] are supported. If you want to support t5 please propose a PR or open up an issue in ONNX Runtime:https://github.com/microsoft/onnxruntime."

tianleiwu commented 1 year ago

@wangyems, could you give update of T5 optimizations?

giantvision commented 1 year ago

Can anyone tell me the progress of the inclusion of T5 into ORTOptimizer/ ORTQuantizer ?

tianleiwu commented 1 year ago

The T5 optimizer is completed. Try the following to generate an optimized fp16 model:

 python -m onnxruntime.transformers.t5.convert_to_onnx -m t5-small --output ./onnx -o --use_gpu -p fp16

You can also try beam search optimization with T5:

python -m onnxruntime.transformers.convert_generation -m t5-small --model_type t5 --output t5_small_beam_search.onnx          --use_gpu --past_present_share_buffer --use_decoder_masked_attention