Open x180380 opened 1 year ago
I have also encountered the same issue.
I have seen this happen outside of whisper-timestamped with other whisper implementations as well. Is it caused by hallucination or not using VAD, I am curious?
Also seeing this- mostly during quiet parts if that helps at all. Otherwise the transcription is spot on- even with the hardest content.
For this particular sample, --accurate will get rid of the duplicates. The problem is, there is no single set of parameters that works best for everything. Sometimes I've even had to switch to a smaller model to get the timings right.
Yes, exactly @misutoneko No free lunch...
When using small or tiny model, the duplicated phrases decrease. WhiperX also has this issue.
Some people reported that using a higher value for compression_ratio_threshold than the default improves this issue.
typically --compression_ratio_threshold 1
Had the same problem, with >10 repititions for several .mp3's.
Solution that worked for me was adding --compression_ratio_threshold 1 --accurate
Whisper-timestamped will generate duplicated phrases for some audio, such as https://flex2.acast.com/s/pbs-newshour-segments/u/d3i6fh83elv35t.cloudfront.net/static/2023/05/newswrap-15.mp3 I use small and medium model