trained bangla ViTs model with phoneme.
training notebook -> https://github.com/mobassir94/comprehensive-bangla-tts/blob/main/bn_vits_tts/Bangla_phoneme_ViTS_trainer.ipynb
test/inference notebook -> https://github.com/mobassir94/comprehensive-bangla-tts/blob/main/bn_vits_tts/Bangla_phoneme_ViTS_inference.ipynb
All weight files -> https://www.kaggle.com/datasets/mobassir/comprehensive-bangla-tts
we are delighted to let you know that the bangla tts work of this repo is now available in famous COQUIđ¸TTS(Text-to-Speech),please check this -> https://github.com/coqui-ai/TTS/releases and this colab demo as well -> https://github.com/mobassir94/comprehensive-bangla-tts/blob/main/Bangla_text_to_speech_(TTS).ipynb
however to use the multilingual tts pipeline you still need codebase of this repository,thanks
With infinite kindness,mercy and blessings of Allah, we are launching an open source Islamic book reader system today for everyone that knows/speaks Bangla and arabic. Even though spoken by more than 210 million people as a first or second language,Bangla is still a low resource language. It is also a very difficult language because of its many sounds and spelling rules. Additionally, the script is vastly different from English and other Latin Languages.
The main purpose of making Comprehensive Multilingual Speech synthesis was to reach people through Bengali Hadith and Glorious Quran in the Bengali language.
Collect/Scrape various important bangla-arabic or english-arabic hadith,tafsir and seerah books from the internet and translate english-arabic to bangla-arabic using powerful bangla neural machine translator. you will find our scraper with comprehensive documentation here : https://github.com/mnansary/hadith-srcapper
To the best of our knowledge (from our extensive google search and research and extensive human validation) weâve discovered that the Bangla Vits TTS (text to speech) system that we trained and used for reading various bangla tafsir / hadith is the highest performing State of the Art (SOTA) Bangla neural voice cloning system till this date (Thursday, December 29, 2022) thatâs ever released publicly for Bangla language for free and it beats past TTS systems like gtts,silero-tts,indic-tts by large margin in terms of quality.
First ever multilingual book reading pipeline that can read Bangla+Arabic code mixed books with ease.
We read all the books or sources chapter by chapter and made audiobooks.
performed audiobooks to videobooks conversion using ffmpeg
The entire process may not be 100% accurate. English to Bengali translation may contain errors in many cases, or because it is not read by humans (which is very time-consuming and expensive). It sometimes makes critical pronunciation mistakes as well, but we hope that these problems will be solved by the subsequent improvement of this work InSha'Allah.
we used fantastic coqui-aiđ¸đŦ - toolkit for bangla Text-to-Speech training with IITM dataset converted in ljspeech format. we've trained 4 models and they are : glowtts(male),glowtts(female),vits(male) and vits(female). glowtts didn't perform as well as expected because the coqui-ai used attached vocoder. in order to improve the glowtts performance one need to train spectrogram models and vocoder seperately and used a powerful vocoder instead like hifi gan 2. vits male and female variants are our best model that we used for making most of the audiobooks. from this Comprehensive_Bangla_Text_toSpeech(TTS) demo notebook you can see the sound quality of the vits model is almost as good as the training dataset which can be found here : https://www.kaggle.com/datasets/mobassir/comprehensive-bangla-tts that means End to End vits can clone human voice with high quality and it's attached vocoder is doing enough good job,one way to improve its performance could be to make robust G2P model for bangla and use phonemes during training.
each directory in this repo contains .txt file describing what that particular folders codes are doing.
for multilingual (bangla+arabic) inference demo you can check this colab tutorial Multilingual_(ben+ara)_tts_inference_colab_demo.ipynb and video tutorial of the API version of it is available here
Check out some of the samples generated by our system :
References :