microsoft / Olive

Olive: Simplify ML Model Finetuning, Conversion, Quantization, and Optimization for CPUs, GPUs and NPUs.
https://microsoft.github.io/Olive/
MIT License
1.59k stars 166 forks source link

[Bug]: OnnxQuantization #573

Closed akarym-sl closed 1 year ago

akarym-sl commented 1 year ago

What happened?

When running first OnnxQuantization pass with default parameters and a pass then with QUInt weight and activation type, the model parameters are not quantized to QUInt. To clarify, running pass:

yields different accuracy than running two passes:

Version?

0.3.1

guotuofeng commented 1 year ago

@akarym-sl , what's the config json you used to run the optimization for quantization?

akarym-sl commented 1 year ago

Here is the config. For the second case I prepend ["onnx_conv", "onnx_quant"] to the "pass_flows" list

{
    "input_model":{
        "type":"PyTorchModel",
        "config":{
            "model_path":"model.pt",
            "model_loader":"load_state_dict",
            "model_script":"save.py",
            "dummy_inputs_func":"get_dummy_inputs",
            "io_config":{
                "input_names":[
                    "input"
                ],
                "output_names":[
                    "output"
                ],
                "dynamic_axes":{
                    "input":{
                        "0":"batch"
                    },
                    "output":{
                        "0":"batch"
                    }
                }
            }
        }
    },
    "systems":{
        "local_system":{
            "type":"LocalSystem",
            "config":{
                "accelerators":[
                    "cpu"
                ]
            }
        }
    },
    "evaluators":{
        "custom_evaluator":{
            "metrics":[
                {
                    "name":"custom",
                    "type":"custom",
                    "user_config":{
                        "user_script":"user_script.py",
                        "batch_size":1,
                        "dataloader_func":"create_dataloader",
                        "evaluate_func":"evaluate"
                    },
                    "sub_types":[
                        {
                            "name":"latency",
                            "priority":1,
                            "higher_is_better":false
                        },
                        {
                            "name":"accuracy",
                            "priority":2,
                            "higher_is_better":true
                        }
                    ]
                }
            ]
        }
    },
    "engine":{
        "clean_cache":true,
        "cache_dir":".cache",
        "output_dir":"optimization",
        "host":"local_system",
        "target":"local_system",
        "execution_providers":[
            "CPUExecutionProvider"
        ],
        "evaluator":"custom_evaluator",
        "evaluate_input_model":false
    },
    "passes":{
        "onnx_conv":{
            "type":"OnnxConversion",
            "config":{
                "target_opset":15
            }
        },
        "onnx_quant":{
            "type":"OnnxQuantization",
            "config":{
                "user_script":"user_script.py",
                "dataloader_func":"create_calibrator",
            }
        },
        "onnx_quant_u":{
            "type":"OnnxQuantization",
            "config":{
                "user_script":"user_script.py",
                "dataloader_func":"create_calibrator",
                "weight_type":"QUInt8",
                "activation_type":"QUInt8"
            }
        },
    },
    "pass_flows"[["onnx_conv", "onnx_quant_u"]]
}
guotuofeng commented 1 year ago

@akarym-sl, do you mean the accuracy from model optimized by pass_flows is different with that one without default onnx_quant?

From your description, it seems the accuracy from [["onnx_conv", "onnx_quant_u"]] is different from [ ["onnx_conv", "onnx_quant"], ["onnx_conv", "onnx_quant_u"]]? your question should be the two run for same pass group ["onnx_conv", "onnx_quant_u"] should be same. Is my understanding correct?

akarym-sl commented 1 year ago

Yes, in my understanding, the previous passes shouldn't affect the current. I observe that adding ["onnx_conv", "onnx_quant"] pass affects the accuracy on the ["onnx_conv", "onnx_quant_u"] pass. My guess would be that the model is not quantized to QUInt in the second pass, as it should, but is rather quantized to QInt or not changed at all and loaded from the last pass.

guotuofeng commented 1 year ago

@trajepl is helping looking at

trajepl commented 1 year ago

Thanks for raising it up. It should be a bug from olive side. The root cause is that: olive uses pass's class name(OnnxQuantization) as key to access pass instance(onnx_quant, onnx_quant_u). When there are the same passes but with different configs(onnx_quant, onnx_quant_u), only the first one is used to run quantization(onnx_quant).

I changed the key to pass name in following PR and tested with bert case, it worked well for me.

{
    "input_model":{
        "type": "PyTorchModel",
        "config": {
            "hf_config": {
                "model_name": "Intel/bert-base-uncased-mrpc",
                "task": "text-classification",
                "dataset": {
                    "data_name":"glue",
                    "subset": "mrpc",
                    "split": "validation",
                    "input_cols": ["sentence1", "sentence2"],
                    "label_cols": ["label"],
                    "batch_size": 1
                }
            }
        }
    },
    "evaluators": {
        "common_evaluator": {
            "metrics":[
                {
                    "name": "accuracy",
                    "type": "accuracy",
                    "backend": "huggingface_metrics",
                    "sub_types": [
                        {"name": "accuracy", "priority": 1, "goal": {"type": "max-degradation", "value": 0.01}},
                        {"name": "f1"}
                    ]
                },
                {
                    "name": "latency",
                    "type": "latency",
                    "sub_types": [
                        {"name": "avg", "priority": 2, "goal": {"type": "percent-min-improvement", "value": 20}},
                        {"name": "max"},
                        {"name": "min"}
                    ]
                }
            ]
        }
    },
    "passes": {
        "conversion": {
            "type": "OnnxConversion",
            "config": {
                "target_opset": 13
            }
        },
        "onnx_quant": {
            "type": "OnnxQuantization",
            "config": {
                "data_config": "__input_model_data_config__"
            }
        },
        "onnx_quant_u": {
            "type": "OnnxQuantization",
            "config": {
                "data_config": "__input_model_data_config__",
                "weight_type":"QUInt8",
                "activation_type":"QUInt8"
            }
        }
    },
    "pass_flows": [
        ["conversion", "onnx_quant_u"]
    ],
    "engine": {
        "evaluator": "common_evaluator",
        "execution_providers": ["CPUExecutionProvider"],
        "cache_dir": "cache",
        "output_dir" : "models/bert_ptq_cpu",
        "clean_cache": true
    }
}

Could you help take a try with this PR? @akarym-sl https://github.com/microsoft/Olive/pull/577

git clone https://github.com/microsoft/Olive
pip install .
guotuofeng commented 1 year ago

@akarym-sl, please let us know whether the bug is fixed or not.

akarym-sl commented 1 year ago

I tested the new version (0.4.0) on the same setup and can confirm that the issue is gone! Therefore, closing the issue. Thank you!