microsoft / Olive

Olive: Simplify ML Model Finetuning, Conversion, Quantization, and Optimization for CPUs, GPUs and NPUs.
https://microsoft.github.io/Olive/
MIT License
1.58k stars 165 forks source link

BERT has not final model #1439

Open dangokuson opened 3 days ago

dangokuson commented 3 days ago

Describe the bug I tried to optimize BERT model with bert_ptq_cpu.json but it gave 7 output models. It there any ways or change the config to get only one output model?

[2024-10-25 10:54:59,192] [INFO] [engine.py:816:_run_passes] Run model evaluation for the final model...
[2024-10-25 10:54:59,195] [INFO] [footprint.py:101:create_pareto_frontier] Output all 7 models
[2024-10-25 10:54:59,196] [INFO] [footprint.py:120:_create_pareto_frontier_from_nodes] pareto frontier points: 3_OrtSessionParamsTuning-2-231aed55-cpu-cpu 
{
  "accuracy-accuracy": 0.8529411764705882,
  "accuracy-f1": 0.8913043478260869,
  "latency-avg": 48.46022,
  "latency-max": 65.62145,
  "latency-min": 40.43884,
  "throughput-avg": 20.34093,
  "throughput-max": 23.00423,
  "throughput-min": 16.20369
}
[2024-10-25 10:54:59,206] [INFO] [engine.py:367:run_accelerator] Save footprint to /Users/ubuntu/workspace/projects/AI_Research/Olive/examples/bert/models/bert_ptq_cpu/footprints.json.
[2024-10-25 10:54:59,214] [INFO] [engine.py:294:run] Run history for cpu-cpu:
[2024-10-25 10:54:59,234] [INFO] [engine.py:550:dump_run_history] run history:
+--------------------------------------------------+--------------------------------------------------+-----------------------------+----------------+--------------------------------------------+
| model_id                                         | parent_model_id                                  | from_pass                   |   duration_sec | metrics                                    |
+==================================================+==================================================+=============================+================+============================================+
| 9785a767                                         |                                                  |                             |                | {                                          |
|                                                  |                                                  |                             |                |   "accuracy-accuracy": 0.8602941176470589, |
|                                                  |                                                  |                             |                |   "accuracy-f1": 0.9042016806722689,       |
|                                                  |                                                  |                             |                |   "latency-avg": 77.18956,                 |
|                                                  |                                                  |                             |                |   "latency-max": 104.18961,                |
|                                                  |                                                  |                             |                |   "latency-min": 66.36365,                 |
|                                                  |                                                  |                             |                |   "throughput-avg": 13.79494,              |
|                                                  |                                                  |                             |                |   "throughput-max": 16.03933,              |
|                                                  |                                                  |                             |                |   "throughput-min": 11.9764                |
|                                                  |                                                  |                             |                | }                                          |
+--------------------------------------------------+--------------------------------------------------+-----------------------------+----------------+--------------------------------------------+
| 0_OnnxConversion-9785a767-0b0c1267               | 9785a767                                         | OnnxConversion              |        26.7029 |                                            |
+--------------------------------------------------+--------------------------------------------------+-----------------------------+----------------+--------------------------------------------+
| 1_OrtTransformersOptimization-0-67b9c681-cpu-cpu | 0_OnnxConversion-9785a767-0b0c1267               | OrtTransformersOptimization |        12.8088 |                                            |
+--------------------------------------------------+--------------------------------------------------+-----------------------------+----------------+--------------------------------------------+
| 2_OnnxQuantization-1-133a6d82                    | 1_OrtTransformersOptimization-0-67b9c681-cpu-cpu | OnnxQuantization            |        37.2535 |                                            |
+--------------------------------------------------+--------------------------------------------------+-----------------------------+----------------+--------------------------------------------+
| 3_OrtSessionParamsTuning-2-231aed55-cpu-cpu      | 2_OnnxQuantization-1-133a6d82                    | OrtSessionParamsTuning      |       103.319  | {                                          |
|                                                  |                                                  |                             |                |   "accuracy-accuracy": 0.8529411764705882, |
|                                                  |                                                  |                             |                |   "accuracy-f1": 0.8913043478260869,       |
|                                                  |                                                  |                             |                |   "latency-avg": 48.46022,                 |
|                                                  |                                                  |                             |                |   "latency-max": 65.62145,                 |
|                                                  |                                                  |                             |                |   "latency-min": 40.43884,                 |
|                                                  |                                                  |                             |                |   "throughput-avg": 20.34093,              |
|                                                  |                                                  |                             |                |   "throughput-max": 23.00423,              |
|                                                  |                                                  |                             |                |   "throughput-min": 16.20369               |
|                                                  |                                                  |                             |                | }                                          |
+--------------------------------------------------+--------------------------------------------------+-----------------------------+----------------+--------------------------------------------+
| 4_OnnxQuantization-1-80ae4847                    | 1_OrtTransformersOptimization-0-67b9c681-cpu-cpu | OnnxQuantization            |        36.2477 |                                            |
+--------------------------------------------------+--------------------------------------------------+-----------------------------+----------------+--------------------------------------------+
| 5_OrtSessionParamsTuning-4-231aed55-cpu-cpu      | 4_OnnxQuantization-1-80ae4847                    | OrtSessionParamsTuning      |        86.8893 | {                                          |
|                                                  |                                                  |                             |                |   "accuracy-accuracy": 0.8406862745098039, |
|                                                  |                                                  |                             |                |   "accuracy-f1": 0.8811700182815356,       |
|                                                  |                                                  |                             |                |   "latency-avg": 65.15545,                 |
|                                                  |                                                  |                             |                |   "latency-max": 74.85309,                 |
|                                                  |                                                  |                             |                |   "latency-min": 51.80688,                 |
|                                                  |                                                  |                             |                |   "throughput-avg": 15.95106,              |
|                                                  |                                                  |                             |                |   "throughput-max": 17.65159,              |
|                                                  |                                                  |                             |                |   "throughput-min": 14.35173               |
|                                                  |                                                  |                             |                | }                                          |
+--------------------------------------------------+--------------------------------------------------+-----------------------------+----------------+--------------------------------------------+

To Reproduce

olive run --config bert_ptq_cpu.json

Expected behavior A clear and concise description of what you expected to happen.

Olive config Olive configurations here: https://github.com/microsoft/Olive/blob/main/examples/bert/bert_ptq_cpu.json

Olive logs Add logs here.

Other information

Additional context

onnx                               1.17.0
onnx-tool                          0.9.0
onnxconverter-common               1.14.0
onnxexplorer                       0.2.7
onnxruntime                        1.19.2
onnxruntime_extensions             0.12.0
onnxruntime-tools                  1.7.0
onnxsim                            0.4.36
skl2onnx                           1.17.0
tf2onnx                            1.16.1
xiaoyu-work commented 7 hours ago

You can find the output model path in footprint.json. I had a PR open for copying output model to output_dir: https://github.com/microsoft/Olive/pull/1430. Once the PR got merged, you can pull the main branch and run the optimization again.

dangokuson commented 7 hours ago

@xiaoyu-work So, I have to compare their accuracy and pick the best model, or use the last one?