Open zhang-ge-hao opened 1 year ago
Hi @AkiyamaYummy , I'm seeing similar results on my end as well. Although I'm using smaller input and output sequence length than yours, FasterTransformer still has less latency than DeepSpeed on GPT. Also note that the paper was released more than half a year ago. Both FasterTransformer and DeepSpeed have released a new version with better optimization. So the first graph you posted is kinda outdated IMO.
I wanna ask that is there something wrong in my code to run these frameworks.:pleading_face::pleading_face::pleading_face:
This is the result in deepspeed-inference's paper, showing that deepspeed is always faster than fastertransformer:
But this is my result:
My code:
I install deepspeed with this command:
And this is my code to use huggingface, deepspeed, and fastertransformer:
Code's output. We can see that fastertransformer was faster than deepspeed: (different results maybe due to different beam search strategies in FT and HF)