Closed LYPinASR closed 1 year ago
Duplicate of #23665
@NielsRogge Hi, Can you tell me how far along we are and about how long it will be ready for us to use? Thank you!
PR merged.
Also see:
Can I use my own dataset instead of the dataset "mozilla_foundation_common voice_6.1", which you have shown in tutorial https://huggingface.co/blog/mms_adapters ? If so then , how. Thanks
Sure, you just need to load your own dataset, maybe this helps: https://huggingface.co/docs/datasets/v2.13.1/en/audio_load
Sure, you just need to load your own dataset, maybe this helps: https://huggingface.co/docs/datasets/v2.13.1/en/audio_load Thank you for your kind reply, although I have gone through the suggested tutorial but it didn't help, and ... I have just uploaded some demo dataset here u can check it: https://huggingface.co/datasets/rashmi035/MKB_Hindi_2023 ,As you can see the audio is visible in the dataset viewer but the corresponding ngram is not visible, can you help me with this?
Feature request
Our request a simpler and more convenient inference process for a speech recognition model based on MMS just like wav2vec 2.0 in Transformers.
Motivation
We aim to encapsulate the various subroutines called by Facebook’s official model into a direct speech recognition model that is as easy to use as other transformer-based models like wav2vec 2.0. But we also know that the Hugging face team has been among the industry leaders in this area of work.
Your contribution
We recognize that it may not be feasible for us to directly assist the Hugging Face technical team in this task. We believe that such an effort would be forward-looking given the popularity of MMS in current speech recognition research. The resulting model would be ideal for quickly transcribing our meeting notes.