Is there any disadvantage to use torchaudio.pipelines.MMS_FA to do force alignment for different languages?

Hello,

First I would like to thank you for putting up this amazing package.

I notice that for force alignment, we are using individual wav2vec model for different language..... This would be a bit problematic if I have to host 20+ models for different languages in production...

I found that torchaudio have this model that can generate character emissions for many different languages and they build the following common alignment pipeline for different languages with the same model https://pytorch.org/audio/stable/tutorials/forced_alignment_for_multilingual_data_tutorial.html

Have you tried this? I'm pretty new in asr so I don't know if there is any disadvantage to do so.

Thank you

m-bain / whisperX

Is there any disadvantage to use torchaudio.pipelines.MMS_FA to do force alignment for different languages? #626