patil-suraj / onnx_transformers

Accelerated NLP pipelines for fast inference on CPU. Built with Transformers and ONNX runtime.
Apache License 2.0
125 stars 27 forks source link

Added Quantize option #5

Closed yusufcakmakk closed 3 years ago

yusufcakmakk commented 3 years ago

Merging existing Onnx inference with Quantize option. This request is based Huggingface nootbeook . Also Microsoft has similar notebook.

patil-suraj commented 3 years ago

Thank you for this awesome feature @yusufcakmakk !

Could you send the PR from a different branch than master ? This needs few changes, we could do it on that branch and then merge.

patil-suraj commented 3 years ago

Let me know if you are busy, I will work on it and include you as co-author :)

Thanks again!

yusufcakmakk commented 3 years ago

I will try to update as you mentioned. By the way you are welcome. I hope it will be useful 😊