First I would like to thank you for putting up this amazing package.
I notice that for force alignment, we are using individual wav2vec model for different language..... This would be a bit problematic if I have to host 20+ models for different languages in production...
Hello,
First I would like to thank you for putting up this amazing package.
I notice that for force alignment, we are using individual wav2vec model for different language..... This would be a bit problematic if I have to host 20+ models for different languages in production...
I found that torchaudio have this model that can generate character emissions for many different languages and they build the following common alignment pipeline for different languages with the same model https://pytorch.org/audio/stable/tutorials/forced_alignment_for_multilingual_data_tutorial.html
Have you tried this? I'm pretty new in asr so I don't know if there is any disadvantage to do so.
Thank you