[Feature Request] Whisper Prompting feature

cfasana commented 3 months ago

Is your feature request related to a problem? Please describe. In the application I am considering, the recognition of technical terms is fundamental to guarantee the release of a successful solution. However, ASR models in general have difficulties in recognizing very specific terms.

OpenAI Whisper allows to feed a prompt to the decoder which makes use of a simple language model. The prompt can be used to help stitch together multiple audio segments or as a spelling guide to improve the recognition of specific terms, and it proved to be very useful.

Reference: https://cookbook.openai.com/examples/whisper_prompting_guide

Describe the solution you'd like It would be very useful to update the Whisper model to accept another input which is a set of decoder_input_ids representing the prompt. An image of the idea behind this can be found at the following link: https://github.com/openai/whisper/discussions/117

Describe alternatives you've considered The recognition of specific terms can be improved using other strategies such as fine-tuning, but prompting is a much easier and faster alternative in many cases.

Additional context This feature is already supported in HuggingFace Transformers (https://github.com/huggingface/transformers/issues/22395)

quic / ai-hub-models

[Feature Request] Whisper Prompting feature #24