jianfch / stable-ts

Transcription, forced alignment, and audio indexing with OpenAI's Whisper
MIT License
1.59k stars 176 forks source link

--only_voice_freq equivalent in whisper #299

Closed ls-milkyway closed 10 months ago

ls-milkyway commented 10 months ago

--only_voice_freq works well in medium model but I don't find this in whisper documents ....is it "No Speech Threshold 0.6 or 0.7" or any other whisper command equivalent?

jianfch commented 10 months ago

They are very different in terms of what they do because only_voice_freq alters the audio itself before transcription and no_speech_threshold acts as a filter for the transcription results. As far as I'm aware, Whisper does not have an equivalent option. https://github.com/jianfch/stable-ts/blob/f6d61c228d5a00f89637422537d36cd358e5b90d/stable_whisper/whisper_word_level.py#L165-L166 https://github.com/jianfch/stable-ts/blob/f6d61c228d5a00f89637422537d36cd358e5b90d/stable_whisper/whisper_word_level.py#L103-L105