linto-ai / whisper-timestamped

Multilingual Automatic Speech Recognition with word-level timestamps and confidence
GNU Affero General Public License v3.0
2.01k stars 156 forks source link

generate duplicated phrases #94

Open x180380 opened 1 year ago

x180380 commented 1 year ago

Whisper-timestamped will generate duplicated phrases for some audio, such as https://flex2.acast.com/s/pbs-newshour-segments/u/d3i6fh83elv35t.cloudfront.net/static/2023/05/newswrap-15.mp3 I use small and medium model

passerbya commented 1 year ago

图片 I have also encountered the same issue.

blundercode commented 1 year ago

I have seen this happen outside of whisper-timestamped with other whisper implementations as well. Is it caused by hallucination or not using VAD, I am curious?

pinballelectronica commented 1 year ago

Also seeing this- mostly during quiet parts if that helps at all. Otherwise the transcription is spot on- even with the hardest content.

misutoneko commented 1 year ago

For this particular sample, --accurate will get rid of the duplicates. The problem is, there is no single set of parameters that works best for everything. Sometimes I've even had to switch to a smaller model to get the timings right.

Jeronymous commented 1 year ago

Yes, exactly @misutoneko No free lunch...

x180380 commented 1 year ago

When using small or tiny model, the duplicated phrases decrease. WhiperX also has this issue.

Jeronymous commented 1 year ago

Some people reported that using a higher value for compression_ratio_threshold than the default improves this issue. typically --compression_ratio_threshold 1

mattdl-radix commented 11 months ago

Had the same problem, with >10 repititions for several .mp3's. Solution that worked for me was adding --compression_ratio_threshold 1 --accurate