microsoft / onnxruntime-extensions

onnxruntime-extensions: A specialized pre- and post- processing library for ONNX Runtime
MIT License
323 stars 84 forks source link

Whisper Preprocessing Pipeline is incompatible with whisper-large-v3 #651

Open jambayk opened 7 months ago

jambayk commented 7 months ago

The whisper params are hardcoded using _WhisperHParams class. https://github.com/microsoft/onnxruntime-extensions/blob/307e712f20796e2a05ca4ddd078e81b89562e1df/onnxruntime_extensions/_torch_cvt.py#L26

However, for whisper-large-v3, the value of N_MELS is 128 https://huggingface.co/openai/whisper-large-v3/blob/main/preprocessor_config.json#L4 and not 80.

In Olive, we workaround this by updating the class attribute but it would be helpful if ort-extensions supports this too. Since it already has access to the whisper preprocessor, perhaps it might be better to get the whisper params directly from it instead of storing them in a class?