triton-inference-server / fastertransformer_backend

BSD 3-Clause "New" or "Revised" License
411 stars 133 forks source link

T5 not performing as expeceted #64

Open nrakltx opened 1 year ago

nrakltx commented 1 year ago

Description

I am trying to optimize T5-small inference using Fastertransformer. I am running on a single V100, I followed all the steps in `t5_guide.md` exactly and got a sensible BLEU score. And yet, when measuring performance of inference (with the time it takes to set `InputTensor`s of the client etc, the performance boost is far from x22 as promoted in the related blogpost. I was not able to run it using `fp16` as the model is not stable enough (this has been mentioned multiple times in the `transformers` repo.
Am I missing something? Is there a way to run with `fp16` that I am not aware of?

Thanks in advance for your reply,

N

Reproduced Steps

Follow the T5 guide/blogpost.
byshiue commented 1 year ago

Can you share the scripts you use to run t5-small, and also share the results you see?

nrakltx commented 1 year ago

I ran the e2e script in t5_utils and got 6500~ tokens encoded in 25 seconds. This is the same time it takes for PyTorch.

byshiue commented 1 year ago

Can you post "your scripts" and the "results show in terminal"?