m-bain / whisperX

WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
BSD 2-Clause "Simplified" License
11.45k stars 1.2k forks source link

Timings overlap problem with --max_line_width and --max_line_count options #608

Open carolinaxxxxx opened 9 months ago

carolinaxxxxx commented 9 months ago

Hi,

It seems that when using the --max_line_width and --max_line_count option there are incorrect timings in the output srt file:

Example:

1
00:00:15,912 --> 00:00:17,293
Good morning, loves! We're going to get 
breakfast! We're going to get breakfast at a new

2
00:00:19,935 --> 00:00:21,456
spot. It's called Spoon. Can't wait for you guys to 
see it. Enjoy! Please enjoy this Shed Eggs. Enjoy!

3
**00:01:02,962 --> 00:01:23,152**
hi guys oh hi sweethearts and welcome back to my 
channel welcome back to another video this is a

4
**00:01:02,962 --> 00:01:23,152**
highly anticipated video this is my birthday vlog 
thanks per welcome to my birthday vlog and it has

5
**00:01:02,962 --> 00:01:23,152**
been very eventful i'm not gonna lie am i too bright 
for you guys i probably am too bright let me

6
**00:01:02,962 --> 00:01:23,152**
probably like stand here welcome welcome welcome 
abroad welcome to thy sweet family welcome if

7
00:01:24,733 --> 00:01:46,482
you're new um if you're new to the channel i hope you 
like the video enough to give it a subscribe like

8
00:01:24,733 --> 00:01:46,482
down below and please comment guys i'm so happy i'm 
so sorry i need to calm down okay if you're new to the

9
00:01:47,842 --> 00:02:07,296
channel i hope you like the channel enough to give 
it a thumbs up and comment down below and leave a

10
00:01:47,842 --> 00:02:07,296
subscription to become part of this sweet family 
and become a sweetheart i hope that's more calm but

11
00:01:47,842 --> 00:02:07,296
yeah guys welcome to my birthday vlog we are about 
to okay wait i don't know what i'm saying about my

The times overlap **, so the output file is poorly constructed and, as a result, read.

Full command:

whisperx --batch_size 8 --model large-v3 --language en --device cuda --max_line_width 42 --max_line_count 2 --verbose False --output_format srt

It appears that using this option causes incorrect timestamps to be generated in the output files regardless of the model used. Without the option, timestamps are ok.

carolinaxxxxx commented 9 months ago

Can anyone confirm this issue? Thx.

antonfp commented 9 months ago

I have the same problem

schemesmith commented 8 months ago

same issue

schemesmith commented 8 months ago

oh actually was looking at the other issues and found a solution: set --highlight_words to True and just use some python script to get rid of the <u> </u>'s

rubentorresbonet commented 8 months ago

Same problem.