Closed dsgissin closed 3 years ago
Thanks a lot for this issue @dsgissin! Will take a look this week!
Hey! Did you get a chance to look into the runtime degradation?
Thanks
Looking now! Sorry for the delay
Okey, I can reproduce the degradation! Will try to fix it today
I think this PR should fix it: https://github.com/huggingface/transformers/pull/10496
Let me know if you still encounter a degradation!
Thanks a mille for spotting this degradation - you probably now made T5 faster for the whole community :-)
Great, thanks a lot for the quick fix!
Environment info
transformers
version: 4.2.1 VS 3.4.0Who can help
@patrickvonplaten, @patil-suraj
Information
Model I am using (Bert, XLNet ...): T5
The problem arises when using:
The tasks I am working on is:
Hello,
I’ve noticed that the running time of T5 on a GPU has increased between v3.4.0 and the current version (v4.2.1). When running inference on a single example on a K80 GPU (Google Colab), the average runtime of a generate() call for a single example (the one in the transformers documentation) with t5-base in v3.4.0 is 539 ± 13 ms, while the runtime for v4.2.1 is 627 ± 13 ms. On t5-large, the difference is 1004 ± 22 ms, compared to 1242 ± 15 ms.
I made two colab notebooks that compare the two versions: https://colab.research.google.com/drive/1Rm9RFdfLUFFHOvjAOg816-6oXw8zm_tE?usp=sharing#scrollTo=eeJ0sS_g7-X2 https://colab.research.google.com/drive/1U2QPA4MR48xPCpn4XiG5KBk3qZGYeoIJ?usp=sharing
I’m aware of a at least one bug fix that was made to the attention mechanism of T5 in v4.0.0 (#8158), but I don’t think this change should have caused such a degradation. Any idea why such a degradation occurred?
Thanks!
To reproduce
See Colab notebooks attached. See the following code snippet as well: