k2-fsa / sherpa

Speech-to-text server framework with next-gen Kaldi
https://k2-fsa.github.io/sherpa
Apache License 2.0
473 stars 97 forks source link

How to add hot words in whisper prompt #597

Open taorui-plus opened 1 month ago

taorui-plus commented 1 month ago

I completed the concurrency test based on tensor RT + triton server deployment, and the concurrency was about doubled compared to faster-whisper.

I am testing its accuracy, but the Chinese transcription always shows 繁体字. I want to solve this problem by adding hot words in the prompt, but I encounter some problems.

I tried both of these methods with no success: <|startofpref|>{prompt}<|startoftranscript|><|zh|><|transcribe|><|notimestamps|> <|startoftranscript|><|zh|><|transcribe|><|notimestamps|>{prompt}<|endoftext|>

Here is some information I referenced:

  1. The whisper documentation only gives this way of writing: <|startoftranscript|> <|en|> <|transcribe|> <|notimestamps|>

  2. The readme document of triton/whisper shows: image

taorui-plus commented 4 weeks ago

@yuekaizhang

I found the difference between faster-whisper and triton whisper prompts is: faster-whisper:<|transcribe|>{prompt_text}<|startoftranscript|><|startofprev|><|startoflm|> Triton whisper: <|startoftranscript|> <|{language}|> <|transcribe|> <|notimestamps|> faster-whisper uses <|startofprev|><|startoflm|> token instead of <|notimestamps|> token, This openAI discussion explains that <|startoflm|> token is not used in openAI's API. Therefore, triton whisper is the same as openAI's API,This solution also has better recognition performance.