Closed david-sitsky closed 1 month ago
Hi, like you mentioned the audio decoder used in pre-processing only supports 1-D graph https://github.com/microsoft/Olive/issues/354#issuecomment-1601242379 so it cannot be enabled. You can see here in the onnxruntime-extensions source code that there is no dynamic axes for batch https://github.com/microsoft/onnxruntime-extensions/blob/be29e28dd76f5fb8f2fdc7d9d3880be27b680ede/onnxruntime_extensions/_torch_cvt.py#L180
Thanks @jambayk. Is there a way to configure Olive/whisper to create a model which does not do any pre-processing (I'll do it outside of the model) so that I get a model which can accept batched input?
I haven't tried it myself before but you can try by removing the prepost
component of the workflow when generating the workflow config https://github.com/microsoft/Olive/blob/1ce4b5f84b8a7eb3b9036ebe940d0017d457ca9b/examples/whisper/prepare_whisper_configs.py#L15
That should give you a model without any of pre-post processsing graphs. There is currently no option to just disable the preprocessing part and keep the post processor.
I tried that, but sadly something still doesn't seem right?
Traceback (most recent call last):
File "/usr/local/bin/olive", line 8, in <module>
sys.exit(main())
File "/usr/local/lib/python3.10/dist-packages/olive/cli/launcher.py", line 44, in main
service.run()
File "/usr/local/lib/python3.10/dist-packages/olive/cli/run.py", line 42, in run
olive_run(**var_args)
File "/usr/local/lib/python3.10/dist-packages/olive/workflows/run/run.py", line 297, in run
return run_engine(package_config, run_config, data_root)
File "/usr/local/lib/python3.10/dist-packages/olive/workflows/run/run.py", line 261, in run_engine
engine.run(
File "/usr/local/lib/python3.10/dist-packages/olive/engine/engine.py", line 267, in run
run_result = self.run_accelerator(
File "/usr/local/lib/python3.10/dist-packages/olive/engine/engine.py", line 339, in run_accelerator
output_footprint = self.run_no_search(
File "/usr/local/lib/python3.10/dist-packages/olive/engine/engine.py", line 431, in run_no_search
should_prune, signal, model_ids = self._run_passes(
File "/usr/local/lib/python3.10/dist-packages/olive/engine/engine.py", line 846, in _run_passes
signal = self._evaluate_model(model_config, model_id, data_root, evaluator_config, accelerator_spec)
File "/usr/local/lib/python3.10/dist-packages/olive/engine/engine.py", line 1052, in _evaluate_model
signal = self.target.evaluate_model(model_config, data_root, metrics, accelerator_spec)
File "/usr/local/lib/python3.10/dist-packages/olive/systems/local.py", line 47, in evaluate_model
return evaluator.evaluate(model, data_root, metrics, device=device, execution_providers=execution_providers)
File "/usr/local/lib/python3.10/dist-packages/olive/evaluator/olive_evaluator.py", line 205, in evaluate
metrics_res[metric.name] = self._evaluate_latency(
File "/usr/local/lib/python3.10/dist-packages/olive/evaluator/olive_evaluator.py", line 123, in _evaluate_latency
latencies = self._evaluate_raw_latency(
File "/usr/local/lib/python3.10/dist-packages/olive/evaluator/olive_evaluator.py", line 762, in _evaluate_raw_latency
return self._evaluate_onnx_latency(model, metric, dataloader, post_func, device, execution_providers)
File "/usr/local/lib/python3.10/dist-packages/olive/evaluator/olive_evaluator.py", line 543, in _evaluate_onnx_latency
latencies = session.time_run(
File "/usr/local/lib/python3.10/dist-packages/olive/common/ort_inference.py", line 334, in time_run
self.session.run(input_feed=input_feed, output_names=None)
File "/usr/local/lib/python3.10/dist-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 216, in run
self._validate_input(list(input_feed.keys()))
File "/usr/local/lib/python3.10/dist-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 198, in _validate_input
raise ValueError(
ValueError: Required inputs (['input_features']) are missing from input feed (['max_length', 'min_length', 'num_beams', 'num_return_sequences', 'length_penalty', 'repetition_penalty', 'decoder_input_ids']).
Ok, so I had to update whisper_dataset.py
so that input_features
was being passed in appropriately and removed the existing preprocessed audio input.
@jambayk - I was able to create a model without prepost processing which suits me fine as I have code which can handle that. However while input_features and decoder_input_ids are "batched", the beam search parameters are not and this seems to prevent me from performing "batched inferencing". Is there a way to make all the parameters batched?
Any ideas or have I misunderstood something?
I managed to get batching working now with the above model.
I've built various models using Olive with Whisper from examples, however despite seeing lots of references in the code within
user_Script.py
and friends todynamic_axes
andbatch_size
, the resulting final model doesn't seem to support batches? Using Netron it appears the internal encoder / decoder components do support batching, but not the pre-processing code?Is there configuration to enable this or is it not possible? For reference, when I pass batched inputs, I get errors such as the following, which makes me think batching has not been enabled correctly in the model.
Is this possible to enable with Olive? Thanks in advance.