Closed michaelrubinfeldgg closed 1 year ago
You can try to use demucs=True
and/or vad=True
.
There's also clamp_max() which was added in the recent commit.
Thank you so much for the quick response! Unfortunately, neither did demucs=True and vad=True help (separate and together), but will try using clamp_max().
Want to drop a line to say that I had the same problem with segment durations being too long and fitting too much words in.
clamp_max() worked like a charm!
Thank you for the lib! The update_seg_with_words
that's auto ran on WhisperResult init took care of issues like showing segments early too. Clear and readable code. Thanks
Want to drop a line to say that I had the same problem with segment durations being too long and fitting too much words in.
clamp_max() worked like a charm!
Thank you for the lib! The
update_seg_with_words
that's auto ran on WhisperResult init took care of issues like showing segments early too. Clear and readable code. Thanks
Hi @hhursev can you explain more detailed how you get a solution to the problem using champ_max ?
Hello there, I am currently using the repo to modify a Whisper model and create word-level timestamps, which has yielded mostly great results.
The main challenge I am currently encountering arises when an audio has a lot of noise and subtle talking in the background. In such cases, the word-level segmentations sometimes "catch" the background noise as part of the word, resulting in excessively long segments. For instance:
[Segment(start=1.74, end=16.14, text=' This is not comment', [...], words=[WordTiming(word=' This', start=1.74, end=11.1, [...])]]
Is there an "in-house" method to address this issue without manually trying to shorten segments? I have also tried denoising the audio using the "noisereduce" package but it didn't work.