Diarization for better splitting of subtitle segments

m-bain / whisperX

WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)

BSD 2-Clause "Simplified" License

12.62k stars 1.34k forks source link

Diarization for better splitting of subtitle segments #452

Open ayancey opened 1 year ago

ayancey commented 1 year ago

Hi,

I was wondering if speech diarization (distinguishing between different speakers) can be used to improve how subtitles are segmented? For example, I am trying to translate an subtitle a Japanese TV show, and sentences from multiple speakers are in a single segment.

Could diarizing improve this so each speaker has their own sentence segment?

Thanks, Alex

ayancey commented 1 year ago

Also, in general I would love some guidance on how to shorten line width. It seems this is not possible when translating, but please correct me if I'm wrong.

cleverestx commented 9 months ago

--chunk_size

Helps a lot with subs being too long....

--chunk_size 6 to --chunk_size 10 is supposed to be optimal for Japanese and Chinese languages at least. (default is 30)