megvii-research / Sparsebit

A model compression and acceleration toolbox based on pytorch.
Apache License 2.0
327 stars 40 forks source link

errors using qViT.onnx to do inference #148

Open yuhuixu1993 opened 1 year ago

yuhuixu1993 commented 1 year ago

With no modification,I using your ptq code to export deit onnx models. But error occurs when using onnxruntime to inference the onnx model.

onnxruntime.capi.onnxruntime_pybind11_state.InvalidGraph: [ONNXRuntimeError] : 10 : INVALID_GRAPH : This is an invalid model. Error in Node:QuantizeLinear_2 : No Op registered for QuantizeLinear with domain_version of 13

yuhuixu1993 commented 1 year ago

I solve this problem by upgrading CUDA and onnxruntime, however, the quantized model is much slower than fp16