huggingface / transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
https://huggingface.co/transformers
Apache License 2.0
133.26k stars 26.61k forks source link

ONNX model conversion #11861

Closed fdlci closed 3 years ago

fdlci commented 3 years ago

Hi,

I have been comparing inference speeds between pytorch models and their ONNX versions. To convert a model from pytorch to ONNX I have used the code your provided in convert_graph_to_onnx.py.

I have built my onnx model as follows as I am applying it to QA: python transformers/src/transformers/convert_graph_to_onnx.py --framework pt --model Camembert-base-ccnet-fquad11 --quantize cam_onnx/camembert-base.onnx --pipeline 'question-answering'

This code outputs 3 models, camembert-base.onnx, camembert-base-optimized.onnx, camembert-base-optimized-quantize.onnx.

I run inference with the three models and I was expecting the quantize version to be much faster than the camembert-base.onnx, but it was the complete opposite. I don't understand why quantization doesn't increase the speedup in this case?

Thank you for your answer!

github-actions[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.