Open zjc664656505 opened 1 year ago
Gentle ping. I tried with Intel/bert-base-uncased-mrpc(418M), I can get the quantized model(145MB) but can not initialize the inference session with the quantized model. The failure shows that: "[ShapeInferenceError] 4b quantization not yet supported on this hardware platform!"
Which kind of hardware platform will be OK for 4 bit quantization? Could you help show more details?
Reproduce step:
python -m olive.workflows.run --config bert_ptq_cpu_4bit_quant.json
or debug with following code:
from olive.workflows import run as olive_run
olive_run("./bert_ptq_cpu_4bit_quant.json")
I also have same problem to inference 4 bit onnx model.
It will check processor features to assign implementation .
//
// Check if the processor supports AVX512 core features
// (AVX512BW/AVX512DQ/AVX512VL).
//
Describe the documentation issue
I recently see onnxruntime adds the int4 blockwise quantization feature and wish to know whether there is any associated documentation regarding this very new feature for us to build and test it?
Page / URL
No response