m-bain / whisperX

WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
BSD 2-Clause "Simplified" License
10.15k stars 1.07k forks source link

Readability trashed after putting length limits. #818

Open ankitgurua opened 1 month ago

ankitgurua commented 1 month ago

Both Whisper and WhisperX have this annoying thing that their default lengths follow nice punctuation rules where subs end at a punctuation and at a time only one sentence appears. Which makes reading much more easier and natural.

Example:

I think he's the love of my life. Are you sure about it? Yes, I love him deeper than I've ever loved anyone.

But obviously this also leaves me with sentences that are so big, they take up 4 sentences. And for captions that's just bad.

So i apply the character limiter and max lines and word count parameters in the script. Tho it might does a good job at limiting the length. It kills the readability of the subs. Sentences breaking at random in the middle of the sub. It's not ending it with a punctuation. First letters are not capital as it's not the beginning of a sentence as actual sentence was started in the middle of the previous sub.

Example:

I think he's the love of my life. Are you sure about it? Yes, I love him deeper than I've ever loved anyone.

(Example is just using 2 speakers but I've the same problem with one speaker narration as well)

What i want is for it to respect the end of sentences AFTER it reduces the length of the sub.

Here's what i want

Example:

I think he's the love of my life. Are you sure about it? Yes, I love him deeper than I've ever loved anyone.

jim60105 commented 3 weeks ago

Try --chunk_size The default value is 30, use a smaller value for captions.