lukaszliniewicz / Pandrator

Turn PDFs and EPUBs into audiobooks, subtitles or videos into dubbed videos (including translation), and more. For free. Pandrator uses local models, notably XTTS, including voice-cloning (instant, RVC-enhanced, XTTS fine-tuning) and LLM processing. It aspires to be a user-friendly app with a GUI, an installer and all-in-one packages.
GNU Affero General Public License v3.0
321 stars 24 forks source link

Multiple Speaker Voices #56

Open Tenidus opened 5 days ago

Tenidus commented 5 days ago

First, I have to say that this is absolutely fantastic. I've tried many, many different TTS models with RVC and this just works. Not only that, it sounds great. I used to train in XTTS-FineTune then train in RVC then run the TTS, import into RVC and be done, this saves me a step and simplifies things dramatically. And the ability to import epub, pdf, etc.. is fantastic! So I greatly appreciate this project.

I do have a question...AllTalk supports 'Character' and 'Narrator' TTS translation. So if a characters speech/text is in encompassed with "" it switches voices and if it encompassed with ** it uses a different voice. Is that something that might be integrated into this project? In addition, could it be more than 2 different voices? I have written several stories that have 3-6 characters and I run them through this individually but it would obviously save time to have the ability to utilize different character voices that are notated by various symbols like AllTalk does (but it only supports 2).

Thank you again. I really just wanted to post how fantastic this project is ;)

lukaszliniewicz commented 5 days ago

Thanks a lot! I'm actually working on this, but I can't say when it will be ready as it requires some changes to both the UI and text processing (I'd like to include automatic speaker attribution via LLM).

Tenidus commented 5 days ago

Thank you for the prompt response! I completely understand and can't imagine what it involves but it's nice to hear you are working on that.

I really appreciate all of the effort you've put into this. I greatly appreciate it being all in 1 package/installable without having the need to run multiple installs and conda/python environments. I know it would have been much simpler to put this in Gradio and run it all through the browser but you've gone to the length of making in super Windows friendly.

Will this support multiple GPUs? I have 2 and TTS didn't like it, stating I needed to run a different command, so I did have to modify the easy_tts_trainer.py to include: os.environ["CUDA_VISIBLE_DEVICES"] = "0"

Wasn't sure if that was another possible thing you were looking into or know a simple fix. It would be really cool if I could either train with both GPUs or run training on 1 GPU while inferring on another at the same time. Not a huge deal, just curious

Thank you again!

lukaszliniewicz commented 5 days ago

I will have to look into it, I've never done anything with multiple GPUs. I have to check if coqui TTS supports it.