Closed zihaomu closed 2 years ago
hi, in order to reduce tuning space and per_channel usually has better accuracy, we just support per_channel in framework yaml https://github.com/intel/neural-compressor/blob/master/neural_compressor/adaptor/onnxrt_qlinear.yaml, you can update 'per_channel' to 'per_tensor' of specific version ORT.
Hi, @mengniwang95. Thanks for your quick reply. Is there an option to directly quantize the entire model as per_tensor
?
@zihaomu Unfortunately there is no other way. We will consider adding 'per_tensor' to framework yaml in next release.
Thanks. Looking forward to the next release.
Hi @mengniwang95, Are there any plans to release API for per-tensor quantization of entire model in the near future?
Hi, 1.12 version support per-tensor way. If you want to get per-tensor quantized model directly, pls add model_wise in yaml file like https://github.com/intel/neural-compressor/blob/aac0a0ec860d6d875467a8b7fb119ec18713fd48/neural_compressor/template/ptq.yaml#L43 and set 'granularity' to per_tensor
Hi, 1.12 version support per-tensor way. If you want to get per-tensor quantized model directly, pls add model_wise in yaml file like
and set 'granularity' to per_tensor
Thanks @mengniwang95, this will be of great help to us.
Hello teams,
I try to quantize all the parameters of my model in a
per_tensor
way. And I found that the final output quantization model still contains layersper_channel
.the
yaml
file is following:Thanks.