microsoft / Olive

Olive is an easy-to-use hardware-aware model optimization tool that composes industry-leading techniques across model compression, optimization, and compilation.
https://microsoft.github.io/Olive/
MIT License
1.36k stars 142 forks source link

Whisper does not converted using onnxruntime-directml #813

Open DimQ1 opened 7 months ago

DimQ1 commented 7 months ago

I prepared a configuration file for converting whisper using directml but the process fails with an error.

To Reproduce

Expected behavior It would be grate to use whisper with directml

Olive config Use following configuration file to convert model: whisper_gpu_int8_dml.json

Olive logs [2023-12-13 09:32:13,697] [DEBUG] [resource_path.py:156:create_resource_path] Resource path D:\Learnig\AI\Olive\examples\whisper\cache\models\0_OnnxConversion-386174a033bcd76f8941e56a22420503-0f2f01796d1fdfcd7c7058df3febec4e\output_model\decoder\model.onnx is inferred to be of type file. [2023-12-13 09:32:13,816] [INFO] [quantization.py:354:_run_for_config] Preprocessing model for quantization [2023-12-13 09:32:58,789] [INFO] [quantization.py:354:_run_for_config] Preprocessing model for quantization [2023-12-13 09:33:24,424] [INFO] [engine.py:931:_run_pass] Running pass insert_beam_search:InsertBeamSearch [2023-12-13 09:33:24,426] [DEBUG] [resource_path.py:156:create_resource_path] Resource path D:\Learnig\AI\Olive\examples\whisper\cache\models\1_OnnxDynamicQuantization-0-81443df774677d62399dbb62abc7a493\output_model\encoder_decoder_init\model.onnx is inferred to be of type file. [2023-12-13 09:33:24,428] [DEBUG] [resource_path.py:156:create_resource_path] Resource path D:\Learnig\AI\Olive\examples\whisper\cache\models\1_OnnxDynamicQuantization-0-81443df774677d62399dbb62abc7a493\output_model\decoder\model.onnx is inferred to be of type file. [2023-12-13 09:33:25,604] [WARNING] [insert_beam_search.py:171:chain_model] DecoderMaskedMultiHeadAttention could not be applied to whisper decoder subgraph Removed 203 initializers with duplicated value Removed 101 initializers with duplicated value [2023-12-13 09:33:29,278] [DEBUG] [insert_beam_search.py:192:chain_model] Using IR version 8 for chained model [2023-12-13 09:33:33,548] [INFO] [engine.py:931:_run_pass] Running pass prepost:AppendPrePostProcessingOps [2023-12-13 09:33:33,550] [DEBUG] [resource_path.py:156:create_resource_path] Resource path D:\Learnig\AI\Olive\examples\whisper\cache\models\2_InsertBeamSearch-1-51b19e895c1591ef53a44fb74c8eac16\output_model\model_with_beam_search.onnx is inferred to be of type file. [2023-12-13 09:33:33,551] [DEBUG] [resource_path.py:156:create_resource_path] Resource path D:\Learnig\AI\Olive\examples\whisper\cache\models\2_InsertBeamSearch-1-51b19e895c1591ef53a44fb74c8eac16\output_model\model_with_beam_search.onnx is inferred to be of type file. [W shape_type_inference.cpp:1978] Warning: The shape inference of ai.onnx.contrib::StftNorm type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function. (function UpdateReliable) [2023-12-13 09:33:37,374] [DEBUG] [engine.py:1071:_evaluate_model] Evaluating model ... [2023-12-13 09:33:37,374] [DEBUG] [resource_path.py:156:create_resource_path] Resource path D:\Learnig\AI\Olive\examples\whisper\cache\models\3_AppendPrePostProcessingOps-2-4d9a9990e2391432dff23d272724f7c8\output_model\model_with_beam_search.onnx is inferred to be of type file. [2023-12-13 09:33:37,376] [DEBUG] [resource_path.py:156:create_resource_path] Resource path D:\Learnig\AI\Olive\examples\whisper\cache\models\3_AppendPrePostProcessingOps-2-4d9a9990e2391432dff23d272724f7c8\output_model\model_with_beam_search.onnx is inferred to be of type file. [2023-12-13 09:33:37,976] [DEBUG] [olive_evaluator.py:244:generate_metric_user_config_with_model_io] Model input shapes are not static. Cannot use inferred input shapes for creating dummy data. This will cause an error when creating dummy data for tuning. [2023-12-13 09:33:37,979] [DEBUG] [resource_path.py:156:create_resource_path] Resource path D:\Learnig\AI\Olive\examples\whisper\data is inferred to be of type folder. 2023-12-13 09:33:50.4631945 [E:onnxruntime:, sequential_executor.cc:514 onnxruntime::ExecuteKernel] Non-zero status code returned while running Conv node. Name:'/whisper_encoder/encoder/conv1/Conv' Status Message: D:\a_work\1\s\onnxruntime\core\providers\dml\DmlExecutionProvider\src\MLOperatorAuthorImpl.cpp(2538)\onnxruntime_pybind11_state.pyd!00007FFB2E960DA9: (caller: 00007FFB2F07EDDF) Exception(3) tid(3580) 80070057 The parameter is incorrect.

2023-12-13 09:33:50.4741269 [E:onnxruntime:, sequential_executor.cc:514 onnxruntime::ExecuteKernel] Non-zero status code returned while running BeamSearch node. Name:'BeamSearch_node' Status Message: Non-zero status code returned while running Conv node. Name:'/whisper_encoder/encoder/conv1/Conv' Status Message: D:\a_work\1\s\onnxruntime\core\providers\dml\DmlExecutionProvider\src\MLOperatorAuthorImpl.cpp(2538)\onnxruntime_pybind11_state.pyd!00007FFB2E960DA9: (caller: 00007FFB2F07EDDF) Exception(3) tid(3580) 80070057 The parameter is incorrect.

[2023-12-13 09:33:50,485] [WARNING] [engine.py:438:run_accelerator] Failed to run Olive on gpu-dml: [ONNXRuntimeError] : 6 : RUNTIME_EXCEPTION : Non-zero status code returned while running BeamSearch node. Name:'BeamSearch_node' Status Message: Non-zero status code returned while running Conv node. Name:'/whisper_encoder/encoder/conv1/Conv' Status Message: D:\a_work\1\s\onnxruntime\core\providers\dml\DmlExecutionProvider\src\MLOperatorAuthorImpl.cpp(2538)\onnxruntime_pybind11_state.pyd!00007FFB2E960DA9: (caller: 00007FFB2F07EDDF) Exception(3) tid(3580) 80070057 The parameter is incorrect. Traceback (most recent call last): File "C:\Program Files\Python311\Lib\site-packages\olive\engine\engine.py", line 418, in run_accelerator return self.run_no_search( ^^^^^^^^^^^^^^^^^^^ File "C:\Program Files\Python311\Lib\site-packages\olive\engine\engine.py", line 489, in run_no_search should_prune, signal, model_ids = self._run_passes( ^^^^^^^^^^^^^^^^^ File "C:\Program Files\Python311\Lib\site-packages\olive\engine\engine.py", line 910, in _run_passes signal = self._evaluate_model(model_config, model_id, data_root, evaluator_config, accelerator_spec) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Program Files\Python311\Lib\site-packages\olive\engine\engine.py", line 1097, in _evaluate_model signal = self.target.evaluate_model(model_config, data_root, metrics, accelerator_spec) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Program Files\Python311\Lib\site-packages\olive\systems\local.py", line 49, in evaluate_model return evaluator.evaluate(model, data_root, metrics, device=device, execution_providers=execution_providers) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Program Files\Python311\Lib\site-packages\olive\evaluator\olive_evaluator.py", line 215, in evaluate metrics_res[metric.name] = self._evaluate_latency( ^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Program Files\Python311\Lib\site-packages\olive\evaluator\olive_evaluator.py", line 132, in _evaluate_latency latencies = self._evaluate_raw_latency( ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Program Files\Python311\Lib\site-packages\olive\evaluator\olive_evaluator.py", line 784, in _evaluate_raw_latency return self._evaluate_onnx_latency(model, metric, dataloader, post_func, device, execution_providers) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Program Files\Python311\Lib\site-packages\olive\evaluator\olive_evaluator.py", line 559, in _evaluate_onnx_latency session.run(input_feed=input_dict, output_names=None) File "C:\Program Files\Python311\Lib\site-packages\onnxruntime\capi\onnxruntime_inference_collection.py", line 220, in run return self._sess.run(output_names, input_feed, run_options) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ onnxruntime.capi.onnxruntime_pybind11_state.RuntimeException: [ONNXRuntimeError] : 6 : RUNTIME_EXCEPTION : Non-zero status code returned while running BeamSearch node. Name:'BeamSearch_node' Status Message: Non-zero status code returned while running Conv node. Name:'/whisper_encoder/encoder/conv1/Conv' Status Message: D:\a_work\1\s\onnxruntime\core\providers\dml\DmlExecutionProvider\src\MLOperatorAuthorImpl.cpp(2538)\onnxruntime_pybind11_state.pyd!00007FFB2E960DA9: (caller: 00007FFB2F07EDDF) Exception(3) tid(3580) 80070057 The parameter is incorrect.

[2023-12-13 09:33:50,810] [INFO] [engine.py:359:run] Run history for gpu-dml: [2023-12-13 09:33:50,823] [INFO] [engine.py:636:dump_run_history] run history: +------------------------------------------------------------------------------------+------------------------------------------------------------------------------------+----------------------------+----------------+-----------+ model_id parent_model_id from_pass duration_sec metrics +====================================================================================+====================================================================================+============================+================+===========+ 386174a033bcd76f8941e56a22420503

+------------------------------------------------------------------------------------+------------------------------------------------------------------------------------+----------------------------+----------------+-----------+ | 0_OnnxConversion-386174a033bcd76f8941e56a22420503-0f2f01796d1fdfcd7c7058df3febec4e | 386174a033bcd76f8941e56a22420503 | OnnxConversion | 55.8453 | | +------------------------------------------------------------------------------------+------------------------------------------------------------------------------------+----------------------------+----------------+-----------+ | 1_OnnxDynamicQuantization-0-81443df774677d62399dbb62abc7a493 | 0_OnnxConversion-386174a033bcd76f8941e56a22420503-0f2f01796d1fdfcd7c7058df3febec4e | OnnxDynamicQuantization | 70.7202 | | +------------------------------------------------------------------------------------+------------------------------------------------------------------------------------+----------------------------+----------------+-----------+ | 2_InsertBeamSearch-1-51b19e895c1591ef53a44fb74c8eac16 | 1_OnnxDynamicQuantization-0-81443df774677d62399dbb62abc7a493 | InsertBeamSearch | 9.11838 | | +------------------------------------------------------------------------------------+------------------------------------------------------------------------------------+----------------------------+----------------+-----------+ | 3_AppendPrePostProcessingOps-2-4d9a9990e2391432dff23d272724f7c8 | 2_InsertBeamSearch-1-51b19e895c1591ef53a44fb74c8eac16 | AppendPrePostProcessingOps | 3.81862 | | +------------------------------------------------------------------------------------+------------------------------------------------------------------------------------+----------------------------+----------------+-----------+ [2023-12-13 09:33:50,826] [INFO] [engine.py:374:run] No packaging config provided, skip packaging artifacts

Other information

trajepl commented 7 months ago

https://github.com/microsoft/onnxruntime/issues/18805 Seems beam search node for whisper is not available in Dml EP.

guotuofeng commented 6 months ago

@PatriceVignola, is there any plan we add the beam search op support for directml?

peterer0625 commented 4 weeks ago

Any update on this? We'd like to run Whisper with onnxruntime-directml as well.