NOT_IMPLEMENTED : Could not find an implementation for ConvInteger(10) node with name 'Conv_0_quant'

chinmayjog13 commented 1 year ago

Describe the issue

Following the documentation, I dynamically quantiized a resnet based model. The model is quantized and saved without error. However, when I try to create an inference session using the quantized model, the code crashes with the following error.

>>> ort_session = ort.InferenceSession(int8_path, providers=['CPUExecutionProvider'])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/chinmay/anaconda3/envs/v_pytorch2/lib/python3.8/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 360, in __init__
    self._create_inference_session(providers, provider_options, disabled_optimizers)
  File "/home/chinmay/anaconda3/envs/v_pytorch2/lib/python3.8/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 408, in _create_inference_session
    sess.initialize_session(providers, provider_options, disabled_optimizers)
onnxruntime.capi.onnxruntime_pybind11_state.NotImplemented: [ONNXRuntimeError] : 9 : NOT_IMPLEMENTED : Could not find an implementation for ConvInteger(10) node with name 'Conv_0_quant'

This is a duplicate of #12558 , which was closed a few months ago so I guess the support should have been added in onnxruntime, but I am still getting the same error.

To reproduce

import onnx
from onnxruntime.quantization import quantize_dynamic, QuantType

model_fp32 = 'path/to/the/model.onnx'
model_quant = 'path/to/the/model.quant.onnx'
quantized_model = quantize_dynamic(model_fp32, model_quant)

import onnxruntime as ort
ort_session = ort.InferenceSession(model_quant, providers=['CPUExecutionProvider'])

fp32_model quantized_model

Urgency

Not very urgent, but not too low priority also.

Platform

Linux

OS Version

Ubuntu 20.04

ONNX Runtime Installation

Released Package

ONNX Runtime Version or Commit ID

1.14.1

ONNX Runtime API

Python

Architecture

X64

Execution Provider

Default CPU

Execution Provider Library Version

No response

jpg-gamepad commented 1 year ago

I am getting the same issue. I might try reverting to a previous build just to see if it was working before.

jpg-gamepad commented 1 year ago

I just tried the previous releases, they didn't work. Upon inspecting the code, it looks as if the patch may have never gotten deployed to release. Here and here

There should be a line that say class ONNX_OPERATOR_TYPED_KERNEL_CLASS_NAME(kCpuExecutionProvider, kOnnxDomain, 10, uint8_t_int8_t, ConvInteger);

Maybe @jchen351 knows what happened.

CrisLS commented 1 year ago

I'm having the same issue. Can I help get the patch applied?

johnyquest7 commented 1 year ago

Same issue here. Do we have a timeline for the patch?

trnhattan commented 1 year ago

On the similar issue #3130, there is comment that may temporarily solve the issue. But when we change weight_type=QuantType.QInt8 to QuantType.QUInt8, the onnx quantized model seems to perform slower, as also mentioned in that issue. And that also happened to me.

Beside, I think these issues happen due to onnx does not support ConvInteger for signed Int8 datatype, here.

trnhattan commented 1 year ago

In ONNX Runtime docs, in Method Selection subsection, they said that:

In general, it is recommended to use dynamic quantization for RNNs and transformer-based models, and static quantization for CNN models.

So I tried to follow the an end-to-end example given in the docs, and somehow it worked, the weights are int8. Here are steps that I did:

Convert FaceNet-InceptionResNet to ONNX model.
Create CalibrationDataReader by using some facial images.
Execute quantize_static()

Apisteftos commented 1 year ago

I have the same problem. I am finding that onnxruntime does not support the ConvInteger layer unfortunately, that means dynamic quantization is not working in onnxruntime if initial model has CNNs inside. Very sad!

greyovo commented 1 year ago

I found a workaround to solve this, by setting:

quantize_dynamic(input_model, output_model, weight_type=QuantType.QInt8, nodes_to_exclude=['/conv1/Conv'])

Here, nodes_to_exclude should be the list of conv layer name in your model. You may find it in the error message when loading the model using InferenceSession. For example, for the error message mentioned in the title of this issue, it would be: 'Conv_0_quant'.

tommate commented 1 year ago

Another workaround is to exclude all operators causing the issue. For example:

In:

quantize_dynamic(input_model, output_model, weight_type=QuantType.QInt8, operators_to_quantize=['MatMul', 'Attention', 'LSTM', 'Gather', 'Transpose', 'EmbedLayerNormalization'])

"Conv" was removed from operators_to_quantize.

Ibrah-N commented 1 year ago

I'm also facing the same error tired solving this problem.

ogencoglu commented 9 months ago

Same issue for segformer model NotImplemented: [ONNXRuntimeError] : 9 : NOT_IMPLEMENTED : Could not find an implementation for ConvInteger(10) node with name '/segformer/encoder/patch_embeddings.0/proj/Conv_quant'

theonewolf commented 9 months ago

I just hit this issue too. I will skip quantizing Conv operators for now.

hu8813 commented 8 months ago

same error, it only works with QUint8 but not QInt8

joelleoqiyi commented 3 months ago

I found a workaround to solve this, by setting:
quantize_dynamic(input_model, output_model, weight_type=QuantType.QInt8, nodes_to_exclude=['/conv1/Conv'])
Here, nodes_to_exclude should be the list of conv layer name in your model. You may find it in the error message when loading the model using InferenceSession. For example, for the error message mentioned in the title of this issue, it would be: 'Conv_0_quant'.

Hi, I tried this while trying to quantize a whisper model (seq2seq) (see here) however I am getting the following error

NotImplemented: [ONNXRuntimeError] : 9 : NOT_IMPLEMENTED : Could not find an implementation for ConvInteger(10) node with name '/conv1/Conv_quant'

I tried to use nodes_to_exclude in my AutoQuantizationConfig to exclude the node but the error is still the same

qconfig = AutoQuantizationConfig.avx512_vnni(is_static=False, per_channel=False, nodes_to_exclude=['/conv1/Conv_quant'])

Any help would be appreciated! 🙏

tommate commented 3 months ago

I had to better describe the operators to quantize and the following worked for me:

dqconfig = AutoQuantizationConfig.avx512_vnni(is_static=False, per_channel=False)
dqconfig.nodes_to_exclude = ['Conv_quant']
dqconfig.operators_to_quantize=['MatMul', 'Attention', 'LSTM', 'Gather', 'Transpose', 'EmbedLayerNormalization']

Hope this helps, Thomas

moritzsur commented 3 months ago

still only works with QUInt8 for me

Fazankabir commented 2 months ago

@ogencoglu

werer you able to solve the issue. I am facing the same with quantization and also i have more inference time with ONNX can be seen here any help will be appreciated

ogencoglu commented 2 months ago

@fazankabir No solution found so far.

Fazankabir commented 2 months ago

I am able to do quantization with:

model_fp32 = 'model_Segformer.onnx'  
model_quant = "model_dynamic_quant.onnx"
quantized_model = quantize_dynamic(model_fp32, model_quant, weight_type=QuantType.QUInt8)

instead of QuantType.QInt8

but while doing the inference it is taking even more time. do you also have the issue with more inference time after exporting the segformer to onnx @ogencoglu ?

ogencoglu commented 2 months ago

I haven't tested that (onnx is a must for my case) but quantized onnx model has longer inference time than full precision one. So I ditched segformer all together. @Fazankabir

mattam301 commented 2 months ago

I am able to do quantization with:
model_fp32 = 'model_Segformer.onnx'  
model_quant = "model_dynamic_quant.onnx"
quantized_model = quantize_dynamic(model_fp32, model_quant, weight_type=QuantType.QUInt8) 
instead of QuantType.QInt8

but while doing the inference it is taking even more time. do you also have the issue with more inference time after exporting the segformer to onnx @ogencoglu ?

Yeh you right, a lot of operators in onnx are running in CPU instead of GPU after being quantized (maybe they haven't supported yet), so it might take more time to run some quantized model

microsoft / onnxruntime