moonstar-x / discord-tts-bot

A Text-to-Speech bot for Discord.
https://docs.moonstar-x.dev/discord-tts-bot
MIT License
98 stars 84 forks source link

Support for Other TTS Engines? #93

Open rjDipcord opened 8 months ago

rjDipcord commented 8 months ago

:zap: Describe the New Feature

Support for alternate TTS engines would be amazing. Especially if users can host their own TTS engine, and point the bot to it. That would certainly resolve API request cost issues and also provide a much greater spectrum of available voices.

moonstar-x commented 8 months ago

Hey there, do you have an example of what TTS engines could be self-hosted?

At a point I was considering writing an HTTP wrapper for macOS's say command but it would only work in a Mac.

I believe there's some other TTS binaries available for Linux too but I haven't researched them enough.

rjDipcord commented 8 months ago

Hey there, do you have an example of what TTS engines could be self-hosted?

Sure, two examples would be ElevenLabs TTS: https://elevenlabs.io/ or FakeYou https://fakeyou.com/. Perhaps even Amazon Polly.

As for self-hosted, two of the most used are Mimic-3 https://mycroft.ai/mimic-3/ and Coqui Ai TTS https://github.com/coqui-ai/TTS

moonstar-x commented 8 months ago

Hmm, I checked the links you suggested.

Correct me if I'm wrong but ElevenTabs and FakeYou don't seem to be self-hostable, right?

I remember checking out FakeYou over 2 years ago. I tried to implement an interface for it but I quickly reached a rate limit with less than 10 TTS attempts, so I gave up. Not only that but it took an enormous time to generate the voices too, at least in the free version.

As for that mycroft one, it sounds like you need some specialized hardware? I didn't take that much of an intense look in there, so I may be wrong with this one.

In the case of Coqui, I remember seeing it some time ago too. I attempted to run it once on my machine and the performance with CPU was very bad too. Might be my CPU too, since I host the bot with an i3-4170 and I don't have a GPU for that server either.

I have thought of integrating some generative TTS service for this but I have yet to find one that can be used for free or doesn't require specific hardware.

I wouldn't mind working to support any of these, at least the Coqui one, but I don't think I have a way test it with my current server.

rjDipcord commented 8 months ago

I wouldn't mind working to support any of these, at least the Coqui one, but I don't think I have a way test it with my current server.

If you would want to make a lower priority then that's cool. I had submitted the first two just as examples of services that could be supported. Albeit they are subscription based and not self-hostable.

Some of us however do have the hardware to self-host a solution that could perform well. I don't have the programming experience to contribute that way, but If you need someone to test a branch, dm me.