Closed akarym-sl closed 1 year ago
@akarym-sl , what's the config json you used to run the optimization for quantization?
Here is the config. For the second case I prepend ["onnx_conv", "onnx_quant"] to the "pass_flows" list
{
"input_model":{
"type":"PyTorchModel",
"config":{
"model_path":"model.pt",
"model_loader":"load_state_dict",
"model_script":"save.py",
"dummy_inputs_func":"get_dummy_inputs",
"io_config":{
"input_names":[
"input"
],
"output_names":[
"output"
],
"dynamic_axes":{
"input":{
"0":"batch"
},
"output":{
"0":"batch"
}
}
}
}
},
"systems":{
"local_system":{
"type":"LocalSystem",
"config":{
"accelerators":[
"cpu"
]
}
}
},
"evaluators":{
"custom_evaluator":{
"metrics":[
{
"name":"custom",
"type":"custom",
"user_config":{
"user_script":"user_script.py",
"batch_size":1,
"dataloader_func":"create_dataloader",
"evaluate_func":"evaluate"
},
"sub_types":[
{
"name":"latency",
"priority":1,
"higher_is_better":false
},
{
"name":"accuracy",
"priority":2,
"higher_is_better":true
}
]
}
]
}
},
"engine":{
"clean_cache":true,
"cache_dir":".cache",
"output_dir":"optimization",
"host":"local_system",
"target":"local_system",
"execution_providers":[
"CPUExecutionProvider"
],
"evaluator":"custom_evaluator",
"evaluate_input_model":false
},
"passes":{
"onnx_conv":{
"type":"OnnxConversion",
"config":{
"target_opset":15
}
},
"onnx_quant":{
"type":"OnnxQuantization",
"config":{
"user_script":"user_script.py",
"dataloader_func":"create_calibrator",
}
},
"onnx_quant_u":{
"type":"OnnxQuantization",
"config":{
"user_script":"user_script.py",
"dataloader_func":"create_calibrator",
"weight_type":"QUInt8",
"activation_type":"QUInt8"
}
},
},
"pass_flows"[["onnx_conv", "onnx_quant_u"]]
}
@akarym-sl, do you mean the accuracy from model optimized by pass_flows is different with that one without default onnx_quant?
From your description, it seems the accuracy from [["onnx_conv", "onnx_quant_u"]] is different from [ ["onnx_conv", "onnx_quant"], ["onnx_conv", "onnx_quant_u"]]? your question should be the two run for same pass group ["onnx_conv", "onnx_quant_u"] should be same. Is my understanding correct?
Yes, in my understanding, the previous passes shouldn't affect the current. I observe that adding ["onnx_conv", "onnx_quant"] pass affects the accuracy on the ["onnx_conv", "onnx_quant_u"] pass. My guess would be that the model is not quantized to QUInt in the second pass, as it should, but is rather quantized to QInt or not changed at all and loaded from the last pass.
@trajepl is helping looking at
Thanks for raising it up. It should be a bug from olive side. The root cause is that: olive uses pass's class name(OnnxQuantization) as key to access pass instance(onnx_quant, onnx_quant_u). When there are the same passes but with different configs(onnx_quant, onnx_quant_u), only the first one is used to run quantization(onnx_quant).
I changed the key to pass name in following PR and tested with bert case, it worked well for me.
{
"input_model":{
"type": "PyTorchModel",
"config": {
"hf_config": {
"model_name": "Intel/bert-base-uncased-mrpc",
"task": "text-classification",
"dataset": {
"data_name":"glue",
"subset": "mrpc",
"split": "validation",
"input_cols": ["sentence1", "sentence2"],
"label_cols": ["label"],
"batch_size": 1
}
}
}
},
"evaluators": {
"common_evaluator": {
"metrics":[
{
"name": "accuracy",
"type": "accuracy",
"backend": "huggingface_metrics",
"sub_types": [
{"name": "accuracy", "priority": 1, "goal": {"type": "max-degradation", "value": 0.01}},
{"name": "f1"}
]
},
{
"name": "latency",
"type": "latency",
"sub_types": [
{"name": "avg", "priority": 2, "goal": {"type": "percent-min-improvement", "value": 20}},
{"name": "max"},
{"name": "min"}
]
}
]
}
},
"passes": {
"conversion": {
"type": "OnnxConversion",
"config": {
"target_opset": 13
}
},
"onnx_quant": {
"type": "OnnxQuantization",
"config": {
"data_config": "__input_model_data_config__"
}
},
"onnx_quant_u": {
"type": "OnnxQuantization",
"config": {
"data_config": "__input_model_data_config__",
"weight_type":"QUInt8",
"activation_type":"QUInt8"
}
}
},
"pass_flows": [
["conversion", "onnx_quant_u"]
],
"engine": {
"evaluator": "common_evaluator",
"execution_providers": ["CPUExecutionProvider"],
"cache_dir": "cache",
"output_dir" : "models/bert_ptq_cpu",
"clean_cache": true
}
}
Could you help take a try with this PR? @akarym-sl https://github.com/microsoft/Olive/pull/577
git clone https://github.com/microsoft/Olive
pip install .
@akarym-sl, please let us know whether the bug is fixed or not.
I tested the new version (0.4.0) on the same setup and can confirm that the issue is gone! Therefore, closing the issue. Thank you!
What happened?
When running first OnnxQuantization pass with default parameters and a pass then with QUInt weight and activation type, the model parameters are not quantized to QUInt. To clarify, running pass:
yields different accuracy than running two passes:
Version?
0.3.1