Open Saccarab opened 1 year ago
You will likely need v2 for that https://github.com/m-bain/whisperX/issues/232#issuecomment-1546460436
The batched inference does not currently support word level timestamps as of yet
would it be possible in theory to enable word_level timestamps through faster_whisper and patch them into the segments?
would it be possible in theory to enable word_level timestamps through faster_whisper and patch them into the segments?
Yes. The timestamp tokens are being filtered out during decoding. You can remove the filtering, and then process them as needed.
E.g.
for j, token in enumerate(tokens):
if token >= self.tokenizer.timestamp_begin:
timestamp_position = (
token - self.tokenizer.timestamp_begin
)
ts_time = (
round(vad_segments[idx]['start'], 3) + timestamp_position * 0.02
)
if start_time is None:
start_time = ts_time
else:
end_time = ts_time
text = self.tokenizer.decode(token_buffer)
segments.append(
{
"text": text,
"start": start_time,
"end": end_time
}
)
token_buffer = []
start_time = None
else:
token_buffer.append(token)
is it possible to use OG whisper word-level timestamps and skip forced alignment?