Open jasontian6666 opened 2 years ago
In my experiments with longer input sequences (~500 tokens), the onnx performance is only slightly slower than that of the Pytorch model, if not similar. The performance gains o Onnx over Pytorch do diminish for longer sequences, especially above 400 tokens, etc.
Hi @siddharth-sharma7
Thank you for providing fast-bart. It has made my life much easier.
I find the bart-onnx-quantized model 2-3x faster than the Pytorch model. However, when the sequence length is long (~500 tokens), the onnx-based model is 1.5-2x slower.
I also find a similar problem for T5-onnx model that has been discussed at https://github.com/microsoft/onnxruntime/issues/6835#:~:text=the%20converted%20t5%20onnx%20model,and%20higher%20beam%2Dsearch%20number.
Just wondering if we're facing the same issue here.