Open 7k50 opened 1 year ago
large-v2
does increase the accuracy of the transcript itself, while a greater alignment model does increase the accuracy of the timestamp. A greater batch size only affects the transcript, a beam size does so too. As long as the quality of the transcript is not too bad, there shouldn't be any effect of batch size, beam size and large-v2
on the alignment.
Personally I have not experienced any improvement with a better align model used for forced alignment. Still, If you want the best accuracy, use a greater align model (for example WAV2VEC2_ASR_LARGE_LV60K_960H
). I'd say if you only want a good timestamp, the default option is good enough. But this is up for you to decide.
@sorgfresser Thanks for the infos! I understand that increasing batch size and beam size may speed up processing. But does increasing batch size and beam size affect the qualitiy of Whisper transcription?
TLDR: Beam size yes, Batch size no.
Beam size surely does. The batch size is a bit more tricky - if we use batching, we can't utilize the prompt parameter in the same way OpenAI does. According to the author of this repo it does not affect accuracy negatively. You can read the whole discussion on this in #234 Since beam size is simply a bit less greedy, it will affect it in a positive way but will require additional computation, so transcribing will take longer. Still you should note that which beam size is best depends a bit on the beam size used for training (not too much, but it can have an impact).
My aim is to get relatively good timestamp accuracy (good/adequate but doesn't have to be "perfect"), but the instructions are somewhat unclear to me. Readme.md says:
I am assuming that the above means to suggest these settings for good timestamp accuracy, so in other words,
WAV2VEC2_ASR_LARGE_LV60K_960H
is a good choice? Or does it mean to say that the addition of thisalign_model
is not really that useful, but that the addition oflarge-v2
is?Furthermore, what may be appropriate settings for
batch_size
(and possiblybeam_size
) if the goal is to have relatively good timestamp accuracy?