Metas MMS speech recognition

huggingface / transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

https://huggingface.co/transformers

Apache License 2.0

135.23k stars 27.06k forks source link

Metas MMS speech recognition #23811

Closed LYPinASR closed 1 year ago

LYPinASR commented 1 year ago

Feature request

Our request a simpler and more convenient inference process for a speech recognition model based on MMS just like wav2vec 2.0 in Transformers.

Motivation

We aim to encapsulate the various subroutines called by Facebook’s official model into a direct speech recognition model that is as easy to use as other transformer-based models like wav2vec 2.0. But we also know that the Hugging face team has been among the industry leaders in this area of work.

Your contribution

We recognize that it may not be feasible for us to directly assist the Hugging Face technical team in this task. We believe that such an effort would be forward-looking given the popularity of MMS in current speech recognition research. The resulting model would be ideal for quickly transcribing our meeting notes.

NielsRogge commented 1 year ago

Duplicate of #23665

LYPinASR commented 1 year ago

@NielsRogge Hi, Can you tell me how far along we are and about how long it will be ready for us to use? Thank you!

patrickvonplaten commented 1 year ago

PR merged.

Also see:

darknight2163 commented 1 year ago

PR merged.

Also see:

https://huggingface.co/docs/transformers/main/en/model_doc/mms

[MMS] Scaling Speech Technology to 1,000+ Languages | Add attention adapter to Wav2Vec2 #23813

https://huggingface.co/facebook/mms-1b-all

Can I use my own dataset instead of the dataset "mozilla_foundation_common voice_6.1", which you have shown in tutorial https://huggingface.co/blog/mms_adapters ? If so then , how. Thanks

patrickvonplaten commented 1 year ago

Sure, you just need to load your own dataset, maybe this helps: https://huggingface.co/docs/datasets/v2.13.1/en/audio_load

darknight2163 commented 1 year ago

Sure, you just need to load your own dataset, maybe this helps: https://huggingface.co/docs/datasets/v2.13.1/en/audio_load Thank you for your kind reply, although I have gone through the suggested tutorial but it didn't help, and ... I have just uploaded some demo dataset here u can check it: https://huggingface.co/datasets/rashmi035/MKB_Hindi_2023 ,As you can see the audio is visible in the dataset viewer but the corresponding ngram is not visible, can you help me with this?