Open Tenidus opened 1 month ago
Thanks a lot! I'm actually working on this, but I can't say when it will be ready as it requires some changes to both the UI and text processing (I'd like to include automatic speaker attribution via LLM).
Thank you for the prompt response! I completely understand and can't imagine what it involves but it's nice to hear you are working on that.
I really appreciate all of the effort you've put into this. I greatly appreciate it being all in 1 package/installable without having the need to run multiple installs and conda/python environments. I know it would have been much simpler to put this in Gradio and run it all through the browser but you've gone to the length of making in super Windows friendly.
Will this support multiple GPUs? I have 2 and TTS didn't like it, stating I needed to run a different command, so I did have to modify the easy_tts_trainer.py to include: os.environ["CUDA_VISIBLE_DEVICES"] = "0"
Wasn't sure if that was another possible thing you were looking into or know a simple fix. It would be really cool if I could either train with both GPUs or run training on 1 GPU while inferring on another at the same time. Not a huge deal, just curious
Thank you again!
I will have to look into it, I've never done anything with multiple GPUs. I have to check if coqui TTS supports it.
Sorry for off-topic, please would it be possible to add keyboard shortcuts for PLAY, Regenerate, Mark and Play as playlist to the script? It would be very useful for me.
There is one so far - the m key marks a sentence and the right mouse button marks the currently playing and the previous sentence (intended for using when listening but not looking at the interface). I will add the others, sure.
So I figured out a little work around for this. I have the entire source file/text Generated in a 'narrator' voice, then I go and mark the alternate speaking lines, switch the XTTS Model, Speaker Voice and RVC Voice to a different speakers voice and then Regenerate All. I do this for each speaker and it seems to work very well. In order to get the sentences properly segmented I separate them into paragraphs, even if it's a single word, the text is on it's own paragraph.
Good idea, but a lot of work, unfortunately... I'm still working on a robust solution, and it will probably take another two weeks or so. Thanks for the update.
First, I have to say that this is absolutely fantastic. I've tried many, many different TTS models with RVC and this just works. Not only that, it sounds great. I used to train in XTTS-FineTune then train in RVC then run the TTS, import into RVC and be done, this saves me a step and simplifies things dramatically. And the ability to import epub, pdf, etc.. is fantastic! So I greatly appreciate this project.
I do have a question...AllTalk supports 'Character' and 'Narrator' TTS translation. So if a characters speech/text is in encompassed with "" it switches voices and if it encompassed with ** it uses a different voice. Is that something that might be integrated into this project? In addition, could it be more than 2 different voices? I have written several stories that have 3-6 characters and I run them through this individually but it would obviously save time to have the ability to utilize different character voices that are notated by various symbols like AllTalk does (but it only supports 2).
Thank you again. I really just wanted to post how fantastic this project is ;)