What's going on with T5 x torch.compile ?

shivance commented 2 weeks ago

System Info

Hi Team, First of all huge thanks for all the great work you are doing.

Recently, I was benchmarking inference for T5 model on ‪AWS EC2 ( G6E machine with L40 GPU) for batch sizes of 1, 2, 4.

I have heard tons about torch. compile and wanted to try it out and see if it reduces the inference time. Surprisingly, it did the other way around. On average, I saw an increase of ~1 sec in inference time for a sample size of 50 with a length of each sample ranging from [2200, 3000] characters, with an average of around 2550 chars.

I had a chat with a friend about this who told me that T5 is not a very suitable architecture for compilation yet and there are lots of graphbreaks. With his advice, I decided to open an issue here.

From my experience, T5 is still a very good model and I would want to see it work seamlessly with torch compile. If chance comes, I am ready to put my own time into this and contribute to the cause. Let me know what you think.

Who can help?

No response

Information

[X] The official example scripts
[ ] My own modified scripts

Tasks

[X] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
[ ] My own task or dataset (give details below)

Reproduction

‪AWS EC2 ( G6E machine with L40 GPU) for batch sizes of 1, 2, and 4.

Expected behavior

The inference time should reduce post compilation.

LysandreJik commented 2 weeks ago

Thanks for the issue and feature request @shivance!

cc @ArthurZucker regarding supporting torch.compile for T5.

ArthurZucker commented 1 week ago

Hey! T5 does not support the new "cache_positions" so generation will probably be slow as it has to deal with dynamic shapes.

T5 is not a very suitable architecture for compilation

I completely disagree with that! 😉 we just did not have time to ship compile for this model. Tho #32617 should give you alead and also #31166

zucchini-nlp commented 1 week ago

T5 and BART are planned to be compile-compatible in the next batch of models, as it is an encoder-decoder model. I will work on it next month if there's no PR by that time

huggingface / transformers