Open PodsAreAllYouNeed opened 4 days ago
Why the sample on f5--ts work, it seems everything else is pretty bad
with f5 we can change the system prompt of Ichigo a bit and make it more nature
Tested on TTS Arena and added to Drive:
Commercial ElevenLabs FishSpeech v1.4 PlayHT2.0 PlayHT3.0mini XTTSv2
Non-Commercial GPT-SoVITS (MIT License) MeloTTS (MIT License) (Multi-lingual, multi-accent) OpenVoicev2 (MIT License) Parler-TTS and Parler TTS Large(Apache-2.0) StyleTTS2 (MIT License)
unknown license VoiceCraftV2
After testing these models, it seems F5-TTS is the only open-source TTS that can get the pronunciation of both "Ichigo" and reading out of the acronym "AI" correct. The commercial ones have no problem with this of course. The next question is then whether F5-TTS inference is going to be fast enough. Will update after some testing.
We need to replace the current fishspeech with better TTS model.
WIP Shortlist of Possible candidates:
Test sentence: I'm Ichigo, a local AI created by Homebrew Research. I'm here to help answer your questions and make your life easier.
Samples https://drive.google.com/drive/folders/1FbR5H7rqirHDgxbjxO8Zwhxsj5y4t_mq?usp=sharing