m-bain / whisperX

WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
BSD 2-Clause "Simplified" License
12.69k stars 1.35k forks source link

beam_size doesn't seem to work? #524

Closed 9throok closed 1 year ago

9throok commented 1 year ago

Hi, I was working out whisperX, but I happened to find a case where the transcriptions were pretty bad. The same file works good when I try it on bare faster-whisper.

Apart from that, in order to improve the results, I tried to increase the beam_size the iti seems like its not working properly. The reason for this intuition is the model takes the same time to transcribe with beam as 100 and beam as 5 and secondly, there is no chanage in the result.

Here is the parameter combination thta I am trying model='large-v2', align_model='WAV2VEC2_ASR_LARGE_LV60K_960H', batch_size=2, output_format='vtt', diarize=True, beam_size=100

Is there something that I am doing wrong or is that actually a bug?

jkukul commented 1 year ago

I've been reading the code of WhisperX recently and I can confirm that beam_size parameter is actually not passed down to faster-whisper.

9throok commented 1 year ago

Hi @jkukul were you able to figure out how can we pass those arguments down to the model?