m-bain / whisperX

WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
BSD 2-Clause "Simplified" License
11.7k stars 1.24k forks source link

Wav2vec doesn't align numerical characters #869

Open pr-data-port opened 1 month ago

pr-data-port commented 1 month ago

Hi, I have a text were the audio includes numbers (e.g. 16, 29, 32) and the whisperx loads the information and transcript perfect, but when I try to run the word alignment, I stumble upon an issue - the numbers are separated out as words and for that reason they have empty start time and end time values. For the wav2vec models I tried, metadata only includes non-numerical characters [a-z].

Has anyone had any other similar issue and maybe know a wav2vec (from huggingface) model in English that would solve this issue?

Thanks for help in advance,

itaipee commented 1 week ago

Use the option "--suppress_numerals" when you transcribe with whisperX