ONNX model conversion - Githubissues

Hi,

I have been comparing inference speeds between pytorch models and their ONNX versions. To convert a model from pytorch to ONNX I have used the code your provided in convert_graph_to_onnx.py.

I have built my onnx model as follows as I am applying it to QA: python transformers/src/transformers/convert_graph_to_onnx.py --framework pt --model Camembert-base-ccnet-fquad11 --quantize cam_onnx/camembert-base.onnx --pipeline 'question-answering'

This code outputs 3 models, camembert-base.onnx, camembert-base-optimized.onnx, camembert-base-optimized-quantize.onnx.

I run inference with the three models and I was expecting the quantize version to be much faster than the camembert-base.onnx, but it was the complete opposite. I don't understand why quantization doesn't increase the speedup in this case?

Thank you for your answer!

huggingface / transformers

ONNX model conversion #11861