How to add hot words in whisper prompt

k2-fsa / sherpa

Speech-to-text server framework with next-gen Kaldi

Apache License 2.0

473 stars 97 forks source link

I completed the concurrency test based on tensor RT + triton server deployment, and the concurrency was about doubled compared to faster-whisper.

I am testing its accuracy, but the Chinese transcription always shows 繁体字. I want to solve this problem by adding hot words in the prompt, but I encounter some problems.

I tried both of these methods with no success： <|startofpref|>{prompt}<|startoftranscript|><|zh|><|transcribe|><|notimestamps|> <|startoftranscript|><|zh|><|transcribe|><|notimestamps|>{prompt}<|endoftext|>

Here is some information I referenced：

The whisper documentation only gives this way of writing： <|startoftranscript|> <|en|> <|transcribe|> <|notimestamps|>
The readme document of triton/whisper shows：

k2-fsa / sherpa

How to add hot words in whisper prompt #597