microsoft / onnxruntime

ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator
https://onnxruntime.ai
MIT License
14.22k stars 2.87k forks source link

Is there any way to retrieve Quantization type and Quantization parameters using onnxruntime ? #19916

Open OAHLSTM opened 6 months ago

OAHLSTM commented 6 months ago

Describe the issue

Hello, I'm trying to get quantization parameters from an input tensor such as the quantization type (Static Linear per tensor/ Static linear per channel/ dynamic) and the associated quantization parameter (scales & zero_points). In tensorflow-lite, we are able to check if the model is quantized statically per-tensor or per-channel by simply doing:

const TfLiteQuantizationType tflite_qtype = tensor->quantization.type;
switch (tflite_qtype) {
            case TfLiteQuantizationType::kTfLiteAffineQuantization:
            {
                const auto* quantization_params = reinterpret_cast<const TfLiteAffineQuantization*>(tensor->quantization.params);
                if (quant_params->scale && quant_params->scale->size > 1) {
                      // per-channel quantization along the specified dimension
                     uint32_t quant_dim = quantization_params->quantized_dimension;
                     float* scales = quantization_params->scale->data;
                     int32_t* zero_points = quantization_params->zero_point->data;
                     break;
                } else  {
                     float scale = tensor->params.scale;
                     uint32_t zero_point = tensor->params.zero_point;
                }
          }
          case TfLiteQuantizationType::kTfLiteNoQuantization:
          default:
                std::cout << "stai_map_qtype: float or non supported quant type " << std::endl;

I was wondering if there are any ways to do similar quantization parameters retrieving using onnxruntime. Thank you for your help.

To reproduce

Not applicable

Urgency

This is really urgent since we are migrating from tensorflow-lite to onnxruntime, and this feature is kind of crucial for our implementation.

Platform

Linux

OS Version

Ubuntu 22.04

ONNX Runtime Installation

Built from Source

ONNX Runtime Version or Commit ID

1.15.1

ONNX Runtime API

C++

Architecture

X64

Execution Provider

Default CPU

Execution Provider Library Version

No response

hariharans29 commented 6 months ago

AFAIK our Tensor interface provides no interface to query such metadata. As for can it be ascertained at the model level, tagging @yufenglee as I am not sure about that.

OAHLSTM commented 6 months ago

Hello @yufenglee , Any update on the topic ?

Thank you for your support,

github-actions[bot] commented 5 months ago

This issue has been automatically marked as stale due to inactivity and will be closed in 30 days if no further activity occurs. If further support is needed, please provide an update and/or more details.