triton-inference-server / onnxruntime_backend

The Triton backend for the ONNX Runtime.
BSD 3-Clause "New" or "Revised" License
130 stars 56 forks source link

Will onxxruntime backend support INT8 on cpu ? #240

Open bharadwajymg opened 9 months ago

bharadwajymg commented 9 months ago

Hi, we are trying to quantise our onnx models to int8 to run on cpu using : https://onnxruntime.ai/docs/performance/model-optimizations/quantization.html#quantization-on-gpu

we are using dynamic quantisation , and banking on AVX2 and AVX512 extensions , when we tested our models using onnx runtime we see an improvement so cross checking if this backend supports them by directly defining backend in config.pbtxt ?

Jackiexiao commented 8 months ago

yes, onxxruntime backend support INT8 on cpu