Open shairoz-deci opened 3 years ago
@shairoz-deci , for ConvInteger, we have yet to add u8s8 (activation: uint8, weight: int8). Currently, only support u8u8. In general, for CNN models, it is recommended to use static quantization. Here is an example: https://github.com/microsoft/onnxruntime/tree/master/onnxruntime/python/tools/quantization/E2E_example_model/image_classification/cpu
@shairoz-deci , for ConvInteger, we have yet to add u8s8 (activation: uint8, weight: int8). Currently, only support u8u8. In general, for CNN models, it is recommended to use static quantization. Here is an example: https://github.com/microsoft/onnxruntime/tree/master/onnxruntime/python/tools/quantization/E2E_example_model/image_classification/cpu
So, for BERT or Transformer models, it is recommended to use dynamic quantization?
@shairoz-deci , for ConvInteger, we have yet to add u8s8 (activation: uint8, weight: int8). Currently, only support u8u8. In general, for CNN models, it is recommended to use static quantization. Here is an example: https://github.com/microsoft/onnxruntime/tree/master/onnxruntime/python/tools/quantization/E2E_example_model/image_classification/cpu
So, for BERT or Transformer models, it is recommended to use dynamic quantization?
dynamic quantization is easy to use. For most causes, dynamic quantization can get good accuracy for Transformer based model and you don't need to retrain. You can also retrain with QAT and then use static quantization.
Thanks! @yufenglee
Is there a reason we don't support int8
activations? If I'd like to contribute, where should I start looking?
This issue has been automatically marked as stale due to inactivity and will be closed in 7 days if no further activity occurs. If further support is needed, please provide an update and/or more details.
Describe the bug Created an 8bit quantization model following https://github.com/microsoft/onnxruntime/blob/master/onnxruntime/python/tools/quantization/notebooks/Bert-GLUE_OnnxRuntime_quantization.ipynb and got onnxruntime.capi.onnxruntime_pybind11_state.NotImplemented: [ONNXRuntimeError] : 9 : NOT_IMPLEMENTED : Could not find an implementation for the node Conv_0_quant:ConvInteger(10) upon trying to run inference session.
System information
To Reproduce
quantize_onnx_model
from the above link -> models are created properly and weigh as expected.crashed with the above exception.
Expected behavior running inference on quantized model