huggingface / distil-whisper

Distilled variant of Whisper for speech recognition. 6x faster, 50% smaller, within 1% word error rate.
MIT License
3.52k stars 280 forks source link

question about when to apply WER threshold filtering strategy with concatenated audio #127

Open lq0104 opened 5 months ago

lq0104 commented 5 months ago

hi @sanchit-gandhi I think the concatenate strategy is excellent, but I have a question. When concatenate_audio=True, I believe it is necessary to enable the wer_threshold filtering during the Pseudo-Labelling phase instead of waiting until the Training phase. This is because many short audio segments may have high levels of noise individually, but when concatenated, they might not be filtered out.