Facebook releases SeamlessM4T (Multimodal + Multilingual)

SeamlessM4T is a foundational speech/text translation and transcription model that overcomes the limitations of previous systems with state-of-the-art results.

Website: ai.meta.com/resources/models-and-libraries/seamless-communication Code: facebookresearch/seamless_communication Paper: ai.meta.com/research/publications/seamless-m4t Blog Post: ai.meta.com/blog/seamless-m4t

I know this model is for translations, but I wanted to share this with you to see if there is anything you can learn from what they do to improve whisperX. Although I don't know much, skimming through the paper it seems they already implement some of what is done with whisperX such as relying on VAD and w2v 2.0 ASR (section 3.4.2 in their paper)

Feel free to close this, I just wanted to bring it to your attention in case you haven't came across this yet.

m-bain / whisperX

Facebook releases SeamlessM4T (Multimodal + Multilingual) #435