jianfch / stable-ts

Transcription, forced alignment, and audio indexing with OpenAI's Whisper
MIT License
1.6k stars 177 forks source link

Segment and word durations too long #174

Closed michaelrubinfeldgg closed 1 year ago

michaelrubinfeldgg commented 1 year ago

Hello there, I am currently using the repo to modify a Whisper model and create word-level timestamps, which has yielded mostly great results.

The main challenge I am currently encountering arises when an audio has a lot of noise and subtle talking in the background. In such cases, the word-level segmentations sometimes "catch" the background noise as part of the word, resulting in excessively long segments. For instance:

[Segment(start=1.74, end=16.14, text=' This is not comment', [...], words=[WordTiming(word=' This', start=1.74, end=11.1, [...])]]

Is there an "in-house" method to address this issue without manually trying to shorten segments? I have also tried denoising the audio using the "noisereduce" package but it didn't work.

jianfch commented 1 year ago

You can try to use demucs=True and/or vad=True. There's also clamp_max() which was added in the recent commit.

michaelrubinfeldgg commented 1 year ago

Thank you so much for the quick response! Unfortunately, neither did demucs=True and vad=True help (separate and together), but will try using clamp_max().

hhursev commented 1 year ago

Want to drop a line to say that I had the same problem with segment durations being too long and fitting too much words in.

clamp_max() worked like a charm!

Thank you for the lib! The update_seg_with_words that's auto ran on WhisperResult init took care of issues like showing segments early too. Clear and readable code. Thanks

forestsheart commented 1 year ago

Want to drop a line to say that I had the same problem with segment durations being too long and fitting too much words in.

clamp_max() worked like a charm!

Thank you for the lib! The update_seg_with_words that's auto ran on WhisperResult init took care of issues like showing segments early too. Clear and readable code. Thanks

Hi @hhursev can you explain more detailed how you get a solution to the problem using champ_max ?