microsoft / Olive

Olive: Simplify ML Model Finetuning, Conversion, Quantization, and Optimization for CPUs, GPUs and NPUs.
https://microsoft.github.io/Olive/
MIT License
1.58k stars 165 forks source link

Getting KeyError: 'input_model' when trying to optimize whisper-tiny.en model #1283

Open MayuraRam opened 2 months ago

MayuraRam commented 2 months ago

Describe the bug Unable to optimize a model with device- cpu and precision int8. Ending up with KeyError: 'input_model' error

To Reproduce Start with this example: https://github.com/microsoft/onnxruntime-inference-examples/tree/main/js/ort-whisper

Readme says:

  1. Goto: https://github.com/microsoft/Olive/tree/main/examples/whisper and follow the instructions.

  2. Run the following commands

    python prepare_whisper_configs.py --model_name openai/whisper-tiny.en --no_audio_decoder
    python -m olive.workflows.run --config whisper_cpu_int8.json --setup
    python -m olive.workflows.run --config whisper_cpu_int8.json
  3. Move the resulting model from models/whisper_cpu_int8_0_model.onnx to the same directory as this code.

When I did the above with a pip install of olive-ai, I go the KeyError: 'config' error.

Then I tried installing from source as mentioned here - https://github.com/microsoft/Olive/blob/main/examples/README.md

git clone https://github.com/microsoft/Olive.git cd Olive python -m pip install .

Then I tried to "Run the config to optimize the model" from here - https://github.com/microsoft/Olive/blob/main/examples/whisper/README.md

This script runs and creates \Olive-main\examples\whisper\models\conversion-transformers_optimization-onnx_dynamic_quantization-insert_beam_search-prepost\whisper_cpu_int8_cpu-cpu_model.onnx

(olive_env) \Olive-main\examples\whisper>python test_transcription.py --config \Olive-main\examples\whisper\models\conversion-transformers_optimization-onnx_dynamic_quantization-insert_beam_search-prepost\whisper_cpu_int8_cpu-cpu_model.json Traceback (most recent call last): File "\Olive-main\examples\whisper\test_transcription.py", line 126, in output_text = main() ^^^^^^ File "\Olive-main\examples\whisper\test_transcription.py", line 63, in main model_name = config["input_model"]["model_components"][0]["model_path"]


KeyError: 'input_model'

I rename this model to whisper_cpu_int8_0_model.onnx and go back to the sample at https://github.com/microsoft/onnxruntime-inference-examples/tree/main/js/ort-whisper and try to run the model in the browser and get the following error:

Error: Error: invalid input 'attention_mask'

**Expected behavior**
I should get a model that runs successfully with onnxruntime-web 

**Olive config**
Add Olive configurations here.

**Olive logs**

(olive_env) <path>\Olive-main\examples\whisper>python prepare_whisper_configs.py --model_name openai/whisper-tiny.en
config.json: 100%|████████████████████████████████████████████████████████████████████████| 1.94k/1.94k [00:00<?, ?B/s]

(olive_env) <path>\Olive-main\examples\whisper>olive run --config whisper_cpu_int8.json --setup
[2024-08-06 15:01:08,786] [INFO] [run.py:90:get_required_packages] The following packages are required in the local environment: ['onnxruntime']
[2024-08-06 15:01:08,786] [INFO] [run.py:101:install_packages] installing packages: ['onnxruntime']
[2024-08-06 15:01:08,869] [INFO] [run.py:356:check_local_ort_installation] onnxruntime is already installed.

(olive_env) <path>\Olive-main\examples\whisper>olive run --config whisper_cpu_int8.json 2> NUL
[2024-08-06 15:01:41,553] [INFO] [run.py:140:run_engine] Running workflow default_workflow
[2024-08-06 15:01:41,560] [INFO] [cache.py:51:__init__] Using cache directory: <path>\Olive-main\examples\whisper\cache\default_workflow
[2024-08-06 15:01:41,570] [INFO] [engine.py:1020:save_olive_config] Saved Olive config to <path>\Olive-main\examples\whisper\cache\default_workflow\olive_config.json
[2024-08-06 15:01:41,570] [DEBUG] [run.py:179:run_engine] Registering pass onnxconversion
[2024-08-06 15:01:41,570] [DEBUG] [run.py:179:run_engine] Registering pass orttransformersoptimization
[2024-08-06 15:01:41,570] [DEBUG] [run.py:179:run_engine] Registering pass onnxdynamicquantization
[2024-08-06 15:01:41,570] [DEBUG] [run.py:179:run_engine] Registering pass insertbeamsearch
[2024-08-06 15:01:41,570] [DEBUG] [run.py:179:run_engine] Registering pass appendprepostprocessingops
[2024-08-06 15:01:41,583] [DEBUG] [accelerator_creator.py:130:_fill_accelerators] The accelerator device and execution providers are specified, skipping deduce.
[2024-08-06 15:01:41,583] [DEBUG] [accelerator_creator.py:169:_check_execution_providers] Supported execution providers for device cpu: ['CPUExecutionProvider']
[2024-08-06 15:01:41,586] [DEBUG] [accelerator_creator.py:199:create_accelerators] Initial accelerators and execution providers: {'cpu': ['CPUExecutionProvider']}
[2024-08-06 15:01:41,586] [INFO] [accelerator_creator.py:224:create_accelerators] Running workflow on accelerator specs: cpu-cpu
[2024-08-06 15:01:41,586] [DEBUG] [run.py:235:run_engine] Pass onnxconversion already registered
[2024-08-06 15:01:41,586] [DEBUG] [run.py:235:run_engine] Pass orttransformersoptimization already registered
[2024-08-06 15:01:41,586] [DEBUG] [run.py:235:run_engine] Pass onnxdynamicquantization already registered
[2024-08-06 15:01:41,586] [DEBUG] [run.py:235:run_engine] Pass insertbeamsearch already registered
[2024-08-06 15:01:41,586] [DEBUG] [run.py:235:run_engine] Pass appendprepostprocessingops already registered
[2024-08-06 15:01:41,586] [DEBUG] [cache.py:304:set_cache_env] Set OLIVE_CACHE_DIR: <path>\Olive-main\examples\whisper\cache\default_workflow
[2024-08-06 15:01:41,604] [INFO] [engine.py:277:run] Running Olive on accelerator: cpu-cpu
[2024-08-06 15:01:41,604] [INFO] [engine.py:1118:_create_system] Creating target system ...
[2024-08-06 15:01:41,604] [DEBUG] [engine.py:1114:create_system] create native OliveSystem SystemType.Local
[2024-08-06 15:01:41,614] [INFO] [engine.py:1121:_create_system] Target system created in 0.009509 seconds
[2024-08-06 15:01:41,614] [INFO] [engine.py:1130:_create_system] Creating host system ...
[2024-08-06 15:01:41,614] [DEBUG] [engine.py:1114:create_system] create native OliveSystem SystemType.Local
[2024-08-06 15:01:41,614] [INFO] [engine.py:1133:_create_system] Host system created in 0.000000 seconds
[2024-08-06 15:01:41,660] [DEBUG] [engine.py:717:_cache_model] Cached model 9139f706 to <path>\Olive-main\examples\whisper\cache\default_workflow\models\9139f706.json
[2024-08-06 15:01:41,662] [DEBUG] [engine.py:352:run_accelerator] Running Olive in no-search mode ...
[2024-08-06 15:01:41,662] [DEBUG] [engine.py:444:run_no_search] Running ['conversion', 'transformers_optimization', 'onnx_dynamic_quantization', 'insert_beam_search', 'prepost'] with no search ...
[2024-08-06 15:01:41,662] [INFO] [engine.py:886:_run_pass] Running pass conversion:OnnxConversion
[2024-08-06 15:01:48,789] [DEBUG] [pytorch.py:194:get_dummy_inputs] Using dummy_inputs_func to get dummy inputs
[2024-08-06 15:01:51,423] [DEBUG] [conversion.py:196:_export_pytorch_model] Converting model on device cpu with dtype None.
[2024-08-06 15:01:56,203] [DEBUG] [pytorch.py:194:get_dummy_inputs] Using dummy_inputs_func to get dummy inputs
[2024-08-06 15:01:56,558] [DEBUG] [conversion.py:196:_export_pytorch_model] Converting model on device cpu with dtype None.
[2024-08-06 15:01:59,113] [INFO] [engine.py:988:_run_pass] Pass conversion:OnnxConversion finished in 17.451246 seconds
[2024-08-06 15:01:59,117] [DEBUG] [engine.py:717:_cache_model] Cached model 0_OnnxConversion-9139f706-5fa0d4af to <path>\Olive-main\examples\whisper\cache\default_workflow\models\0_OnnxConversion-9139f706-5fa0d4af.json
[2024-08-06 15:01:59,120] [DEBUG] [engine.py:769:_cache_run] Cached run for 9139f706->0_OnnxConversion-9139f706-5fa0d4af into <path>\Olive-main\examples\whisper\cache\default_workflow\runs\OnnxConversion-9139f706-5fa0d4af.json
[2024-08-06 15:01:59,122] [INFO] [engine.py:886:_run_pass] Running pass transformers_optimization:OrtTransformersOptimization
[2024-08-06 15:01:59,232] [DEBUG] [transformer_optimization.py:248:_run_for_config] model_type is set to bart from model attributes
[2024-08-06 15:01:59,233] [DEBUG] [transformer_optimization.py:254:_run_for_config] num_heads is set to 6 from model attributes
[2024-08-06 15:01:59,234] [DEBUG] [transformer_optimization.py:260:_run_for_config] hidden_size is set to 384 from model attributes
[2024-08-06 15:02:04,419] [DEBUG] [transformer_optimization.py:248:_run_for_config] model_type is set to bart from model attributes
[2024-08-06 15:02:04,419] [DEBUG] [transformer_optimization.py:254:_run_for_config] num_heads is set to 6 from model attributes
[2024-08-06 15:02:04,419] [DEBUG] [transformer_optimization.py:260:_run_for_config] hidden_size is set to 384 from model attributes
[2024-08-06 15:02:07,900] [INFO] [engine.py:988:_run_pass] Pass transformers_optimization:OrtTransformersOptimization finished in 8.773139 seconds
[2024-08-06 15:02:07,905] [DEBUG] [engine.py:717:_cache_model] Cached model 1_OrtTransformersOptimization-0-5c93fa9e-cpu-cpu to <path>\Olive-main\examples\whisper\cache\default_workflow\models\1_OrtTransformersOptimization-0-5c93fa9e-cpu-cpu.json
[2024-08-06 15:02:07,905] [DEBUG] [engine.py:769:_cache_run] Cached run for 0_OnnxConversion-9139f706-5fa0d4af->1_OrtTransformersOptimization-0-5c93fa9e-cpu-cpu into <path>\Olive-main\examples\whisper\cache\default_workflow\runs\OrtTransformersOptimization-0-5c93fa9e-cpu-cpu.json
[2024-08-06 15:02:07,905] [INFO] [engine.py:886:_run_pass] Running pass onnx_dynamic_quantization:OnnxDynamicQuantization
[2024-08-06 15:02:07,986] [INFO] [quantization.py:391:_run_for_config] Preprocessing model for quantization
[2024-08-06 15:02:11,336] [INFO] [quantization.py:391:_run_for_config] Preprocessing model for quantization
[2024-08-06 15:02:13,823] [INFO] [engine.py:988:_run_pass] Pass onnx_dynamic_quantization:OnnxDynamicQuantization finished in 5.917982 seconds
[2024-08-06 15:02:13,823] [DEBUG] [engine.py:717:_cache_model] Cached model 2_OnnxDynamicQuantization-1-a1261e22 to <path>\Olive-main\examples\whisper\cache\default_workflow\models\2_OnnxDynamicQuantization-1-a1261e22.json
[2024-08-06 15:02:13,823] [DEBUG] [engine.py:769:_cache_run] Cached run for 1_OrtTransformersOptimization-0-5c93fa9e-cpu-cpu->2_OnnxDynamicQuantization-1-a1261e22 into <path>\Olive-main\examples\whisper\cache\default_workflow\runs\OnnxDynamicQuantization-1-a1261e22.json
[2024-08-06 15:02:13,823] [INFO] [engine.py:886:_run_pass] Running pass insert_beam_search:InsertBeamSearch
Removed 67 initializers with duplicated value
Removed 33 initializers with duplicated value
[2024-08-06 15:02:16,653] [DEBUG] [insert_beam_search.py:302:chain_model] Using IR version 8 for chained model
[2024-08-06 15:02:17,329] [INFO] [engine.py:988:_run_pass] Pass insert_beam_search:InsertBeamSearch finished in 3.505282 seconds
[2024-08-06 15:02:17,329] [DEBUG] [engine.py:717:_cache_model] Cached model 3_InsertBeamSearch-2-82bf64f8 to <path>\Olive-main\examples\whisper\cache\default_workflow\models\3_InsertBeamSearch-2-82bf64f8.json
[2024-08-06 15:02:17,329] [DEBUG] [engine.py:769:_cache_run] Cached run for 2_OnnxDynamicQuantization-1-a1261e22->3_InsertBeamSearch-2-82bf64f8 into <path>\Olive-main\examples\whisper\cache\default_workflow\runs\InsertBeamSearch-2-82bf64f8.json
[2024-08-06 15:02:17,336] [INFO] [engine.py:886:_run_pass] Running pass prepost:AppendPrePostProcessingOps
[2024-08-06 15:02:18,924] [INFO] [engine.py:988:_run_pass] Pass prepost:AppendPrePostProcessingOps finished in 1.587309 seconds
[2024-08-06 15:02:18,936] [DEBUG] [engine.py:717:_cache_model] Cached model 4_AppendPrePostProcessingOps-3-9e247843 to <path>\Olive-main\examples\whisper\cache\default_workflow\models\4_AppendPrePostProcessingOps-3-9e247843.json
[2024-08-06 15:02:18,939] [DEBUG] [engine.py:769:_cache_run] Cached run for 3_InsertBeamSearch-2-82bf64f8->4_AppendPrePostProcessingOps-3-9e247843 into <path>\Olive-main\examples\whisper\cache\default_workflow\runs\AppendPrePostProcessingOps-3-9e247843.json
[2024-08-06 15:02:18,939] [INFO] [engine.py:862:_run_passes] Run model evaluation for the final model...
[2024-08-06 15:02:18,939] [DEBUG] [engine.py:1059:_evaluate_model] Evaluating model ...
[2024-08-06 15:02:20,189] [DEBUG] [ort_inference.py:72:get_ort_inference_session] inference_settings: {'execution_provider': ['CPUExecutionProvider'], 'provider_options': None}
[2024-08-06 15:02:20,189] [DEBUG] [ort_inference.py:111:get_ort_inference_session] Normalized providers: ['CPUExecutionProvider'], provider_options: [{}]
[2024-08-06 15:03:18,633] [DEBUG] [footprint.py:234:_resolve_metrics] There is no goal set for metric: latency-avg.
[2024-08-06 15:03:18,636] [DEBUG] [engine.py:864:_run_passes] Signal: {
  "latency-avg": 1824.62912
}
[2024-08-06 15:03:19,964] [INFO] [engine.py:378:run_accelerator] Save footprint to models\whisper_cpu_int8_cpu-cpu_footprints.json.
[2024-08-06 15:03:19,970] [DEBUG] [engine.py:380:run_accelerator] run_accelerator done
[2024-08-06 15:03:19,970] [INFO] [engine.py:294:run] Run history for cpu-cpu:
[2024-08-06 15:03:21,520] [INFO] [engine.py:591:dump_run_history] run history:
+--------------------------------------------------+--------------------------------------------------+-----------------------------+----------------+-----------------------------+
| model_id                                         | parent_model_id                                  | from_pass                   |   duration_sec | metrics                     |
+==================================================+==================================================+=============================+================+=============================+
| 9139f706                                         |                                                  |                             |                |                             |
+--------------------------------------------------+--------------------------------------------------+-----------------------------+----------------+-----------------------------+
| 0_OnnxConversion-9139f706-5fa0d4af               | 9139f706                                         | OnnxConversion              |       17.4512  |                             |
+--------------------------------------------------+--------------------------------------------------+-----------------------------+----------------+-----------------------------+
| 1_OrtTransformersOptimization-0-5c93fa9e-cpu-cpu | 0_OnnxConversion-9139f706-5fa0d4af               | OrtTransformersOptimization |        8.77314 |                             |
+--------------------------------------------------+--------------------------------------------------+-----------------------------+----------------+-----------------------------+
| 2_OnnxDynamicQuantization-1-a1261e22             | 1_OrtTransformersOptimization-0-5c93fa9e-cpu-cpu | OnnxDynamicQuantization     |        5.91798 |                             |
+--------------------------------------------------+--------------------------------------------------+-----------------------------+----------------+-----------------------------+
| 3_InsertBeamSearch-2-82bf64f8                    | 2_OnnxDynamicQuantization-1-a1261e22             | InsertBeamSearch            |        3.50528 |                             |
+--------------------------------------------------+--------------------------------------------------+-----------------------------+----------------+-----------------------------+
| 4_AppendPrePostProcessingOps-3-9e247843          | 3_InsertBeamSearch-2-82bf64f8                    | AppendPrePostProcessingOps  |        1.58731 | {                           |
|                                                  |                                                  |                             |                |   "latency-avg": 1824.62912 |
|                                                  |                                                  |                             |                | }                           |
+--------------------------------------------------+--------------------------------------------------+-----------------------------+----------------+-----------------------------+
[2024-08-06 15:03:21,770] [INFO] [engine.py:309:run] No packaging config provided, skip packaging artifacts

**Other information**
 - OS: Windows
 - Olive version: main
 - ONNXRuntime package and version:onnxruntime

**Additional context**
Tying to run this sample -  https://github.com/microsoft/onnxruntime-inference-examples/tree/main/js/ort-whisper
jambayk commented 2 months ago

hi, attention_mask was removed from the whisper beam search inputs in ort 1.16.0 so the inference example is outdated. Can you try after removing it from https://github.com/microsoft/onnxruntime-inference-examples/blob/0de2e66e03981714e5308c457b72d785e98d0fe2/js/ort-whisper/main.js#L144

Please refer here for more details of the model inputs https://github.com/microsoft/Olive/blob/main/examples/whisper/code/whisper_dataset.py#L50