microsoft / onnxruntime-inference-examples

Examples for using ONNX Runtime for machine learning inferencing.
MIT License
1.13k stars 318 forks source link

Do QuantizeLinear and DequantizeLinear operators only run with CPU? #166

Open jimmy49503 opened 1 year ago

jimmy49503 commented 1 year ago

I followed ONNX Runtime Quantization Example to get quantized model (https://github.com/microsoft/onnxruntime-inference-examples/tree/main/quantization/image_classification/cpu) and got the output model called mobilenetv2-7.quant.onnx. After quantizing the model, I tried to get output result by running inference. However, when I used onnxruntime-gpu and set all session run parameters for GPU (https://onnxruntime.ai/docs/api/python/api_summary.html#inferencesession) to run mobilenetv2-7.quant.onnx, I found that my computer used CPU to run the model instead of GPU. Besides, I used same session run parameters to run FP model called mobilenetv2-7-infer.onnx and my computer used GPU to run the output inference normally. Hence, I wonder to know if QuantizeLinear and DequantizeLinear operators only run with CPU or I can run these operators with GPU by changing my session run parameters, Thanks.

edgchen1 commented 1 year ago

QuantizeLinear and DequantizeLinear should be supported by the CUDA EP. https://github.com/microsoft/onnxruntime/blob/main/docs/OperatorKernels.md#cudaexecutionprovider

Maybe @yufenglee or @chenfucn can comment about the GPU support for that quantization example.