nomic-ai / gpt4all

GPT4All: Run Local LLMs on Any Device. Open-source and available for commercial use.
https://nomic.ai/gpt4all
MIT License
70.07k stars 7.66k forks source link

Implement Massively Multilingual Speech - Meta's open speech model with speech recognition and TTS in over 1000 languages #693

Open menelic opened 1 year ago

menelic commented 1 year ago

Feature request

Please consider implementing Meta's open source Massively Multilingal Speech (MMS) with speech recognition and generation support for over 1000 languages with a drastically reduced error rate compared to Open AI's Whisper. As GPT4All is the most accessible local LLM/AI installer, adding speech transcription and text to speech would be a real boon for many.

https://github.com/facebookresearch/fairseq/tree/main/examples/mms

https://ai.facebook.com/blog/multilingual-model-speech-recognition/

Motivation

MMS is better than OpenAIs whisper in important ways - the error rate is less than half:

image

at the same time MMS can understand about 4000 and output speech in over 1000 languages.

This opens up open, private and local usages in many areas such as voice based interaction with GPT4Alinterview transcription, language learning, increased accessibility.

Your contribution

testing

ojaksch commented 1 year ago

What about using MaryTTS' API? Should be not that difficult to implement.