Open empz opened 1 month ago
which model did you used can you tell me how to do this I wanna do it for Japanese language, because none of the japanese wav2vec2 I found working the english one works best, so it would be helpful if you share how did you used the multilingual one.
You can check https://github.com/MahmoudAshraf97/ctc-forced-aligner
I don't know much about ML but I was able to use the following tutorial to do force aligment on multilingual transcription. The only requirement is to romanize the transcript which I did with the
uroman
package. https://pytorch.org/audio/stable/tutorials/forced_alignment_for_multilingual_data_tutorial.htmlAccording to that tutorial, it uses the Wav2Vec2 model to do this and I successfully aligned multiple languages. There's an extra step involved in mapping the aligned words back to the original word (non-romanized), but that's pretty much it.
Thoughts?