The Qualcomm® AI Hub Models are a collection of state-of-the-art machine learning models optimized for performance (latency, memory etc.) and ready to deploy on Qualcomm® devices.
Is your feature request related to a problem? Please describe.
In the application I am considering, the recognition of technical terms is fundamental to guarantee the release of a successful solution.
However, ASR models in general have difficulties in recognizing very specific terms.
OpenAI Whisper allows to feed a prompt to the decoder which makes use of a simple language model. The prompt can be used to help stitch together multiple audio segments or as a spelling guide to improve the recognition of specific terms, and it proved to be very useful.
Describe the solution you'd like
It would be very useful to update the Whisper model to accept another input which is a set of decoder_input_ids representing the prompt.
An image of the idea behind this can be found at the following link: https://github.com/openai/whisper/discussions/117
Describe alternatives you've considered
The recognition of specific terms can be improved using other strategies such as fine-tuning, but prompting is a much easier and faster alternative in many cases.
Is your feature request related to a problem? Please describe. In the application I am considering, the recognition of technical terms is fundamental to guarantee the release of a successful solution. However, ASR models in general have difficulties in recognizing very specific terms.
OpenAI Whisper allows to feed a prompt to the decoder which makes use of a simple language model. The prompt can be used to help stitch together multiple audio segments or as a spelling guide to improve the recognition of specific terms, and it proved to be very useful.
Reference: https://cookbook.openai.com/examples/whisper_prompting_guide
Describe the solution you'd like It would be very useful to update the Whisper model to accept another input which is a set of
decoder_input_ids
representing the prompt. An image of the idea behind this can be found at the following link: https://github.com/openai/whisper/discussions/117Describe alternatives you've considered The recognition of specific terms can be improved using other strategies such as fine-tuning, but prompting is a much easier and faster alternative in many cases.
Additional context This feature is already supported in HuggingFace Transformers (https://github.com/huggingface/transformers/issues/22395)
Other useful links: