OG whisper word level timestamps support

Saccarab commented 1 year ago

is it possible to use OG whisper word-level timestamps and skip forced alignment?

m-bain commented 1 year ago

You will likely need v2 for that https://github.com/m-bain/whisperX/issues/232#issuecomment-1546460436

The batched inference does not currently support word level timestamps as of yet

Saccarab commented 1 year ago

would it be possible in theory to enable word_level timestamps through faster_whisper and patch them into the segments?

stri8ed commented 7 months ago

would it be possible in theory to enable word_level timestamps through faster_whisper and patch them into the segments?

Yes. The timestamp tokens are being filtered out during decoding. You can remove the filtering, and then process them as needed.

E.g.

for j, token in enumerate(tokens):
    if token >= self.tokenizer.timestamp_begin:
        timestamp_position = (
                token - self.tokenizer.timestamp_begin
        )
        ts_time = (
                round(vad_segments[idx]['start'], 3) + timestamp_position * 0.02
        )
        if start_time is None:
            start_time = ts_time
        else:
            end_time = ts_time
            text = self.tokenizer.decode(token_buffer)
            segments.append(
                {
                    "text": text,
                    "start": start_time,
                    "end": end_time
                }
            )
            token_buffer = []
            start_time = None
    else:
        token_buffer.append(token)

m-bain / whisperX

OG whisper word level timestamps support #286