Closed sthita-pujari closed 10 months ago
Are the VAD=True in faster whsiper and stable_whisper different?
vad=True
does the same thing across all the transcription methods. It adjusts the timestamps after transcription is completed. So it is unlikely affect the transcript. The transcription is performed in 30 second chunks. The default transcribe method of stable-ts will skip the chunk that the VAD fails to detect any speech in. And if suppress_ts_tokens=True
, it will only allow the decoder to return the segment timestamps within the time ranges that the VAD detects speech. These are the differences.
Few errors I see on the faster_whisper: The model hallucinates at the start and end "You You You".
Faster-whisper uses a different implementation of the model so there is bounded to be differences in the transcription result.
Also I noticed the output for stable-ts==2.5.0 was much better. Now few parts of the transcription are missed on both the new methods.
There were many additional postprocessing added since 2.5 to reduce/remove hallucinations. One of those can be adjusted with max_instant_words
. https://github.com/jianfch/stable-ts/blob/eb00d291e54d82d381a967c30385002db0c8b1ae/stable_whisper/whisper_word_level.py#L178-L179
For both the setting the output is different. Also I noticed the output for stable-ts==2.5.0 was much better. Now few parts of the transcription are missed on both the new methods.
Few errors I see on the faster_whisper: The model hallucinates at the start and end "You You You". Are the VAD=True in faster whsiper and stable_whisper different?