quic / ai-hub-models

The Qualcomm® AI Hub Models are a collection of state-of-the-art machine learning models optimized for performance (latency, memory etc.) and ready to deploy on Qualcomm® devices.
https://aihub.qualcomm.com
BSD 3-Clause "New" or "Revised" License
338 stars 45 forks source link

[Feature Request] Whisper Prompting feature #24

Open cfasana opened 3 months ago

cfasana commented 3 months ago

Is your feature request related to a problem? Please describe. In the application I am considering, the recognition of technical terms is fundamental to guarantee the release of a successful solution. However, ASR models in general have difficulties in recognizing very specific terms.

OpenAI Whisper allows to feed a prompt to the decoder which makes use of a simple language model. The prompt can be used to help stitch together multiple audio segments or as a spelling guide to improve the recognition of specific terms, and it proved to be very useful.

Reference: https://cookbook.openai.com/examples/whisper_prompting_guide

Describe the solution you'd like It would be very useful to update the Whisper model to accept another input which is a set of decoder_input_ids representing the prompt. An image of the idea behind this can be found at the following link: https://github.com/openai/whisper/discussions/117

Describe alternatives you've considered The recognition of specific terms can be improved using other strategies such as fine-tuning, but prompting is a much easier and faster alternative in many cases.

Additional context This feature is already supported in HuggingFace Transformers (https://github.com/huggingface/transformers/issues/22395)

Other useful links:

mestrona-3 commented 3 months ago

Hi @cfasana, thank you for the feature request! We appreciate it greatly. We'll file this internally and add to this issue when we have an update.

cfasana commented 3 months ago

Thanks for the feedback! Looking forward to getting updates.