lukaszliniewicz / Pandrator

Turn PDFs and EPUBs into audiobooks, subtitles or videos into dubbed videos (including translation), and more. For free. Pandrator uses local models, notably XTTS, including voice-cloning (instant, RVC-enhanced, XTTS fine-tuning) and LLM processing. It aspires to be a user-friendly app with a GUI, an installer and all-in-one packages.
GNU Affero General Public License v3.0
348 stars 27 forks source link

Multiple Speaker Voices #56

Open Tenidus opened 1 month ago

Tenidus commented 1 month ago

First, I have to say that this is absolutely fantastic. I've tried many, many different TTS models with RVC and this just works. Not only that, it sounds great. I used to train in XTTS-FineTune then train in RVC then run the TTS, import into RVC and be done, this saves me a step and simplifies things dramatically. And the ability to import epub, pdf, etc.. is fantastic! So I greatly appreciate this project.

I do have a question...AllTalk supports 'Character' and 'Narrator' TTS translation. So if a characters speech/text is in encompassed with "" it switches voices and if it encompassed with ** it uses a different voice. Is that something that might be integrated into this project? In addition, could it be more than 2 different voices? I have written several stories that have 3-6 characters and I run them through this individually but it would obviously save time to have the ability to utilize different character voices that are notated by various symbols like AllTalk does (but it only supports 2).

Thank you again. I really just wanted to post how fantastic this project is ;)

lukaszliniewicz commented 1 month ago

Thanks a lot! I'm actually working on this, but I can't say when it will be ready as it requires some changes to both the UI and text processing (I'd like to include automatic speaker attribution via LLM).

Tenidus commented 1 month ago

Thank you for the prompt response! I completely understand and can't imagine what it involves but it's nice to hear you are working on that.

I really appreciate all of the effort you've put into this. I greatly appreciate it being all in 1 package/installable without having the need to run multiple installs and conda/python environments. I know it would have been much simpler to put this in Gradio and run it all through the browser but you've gone to the length of making in super Windows friendly.

Will this support multiple GPUs? I have 2 and TTS didn't like it, stating I needed to run a different command, so I did have to modify the easy_tts_trainer.py to include: os.environ["CUDA_VISIBLE_DEVICES"] = "0"

Wasn't sure if that was another possible thing you were looking into or know a simple fix. It would be really cool if I could either train with both GPUs or run training on 1 GPU while inferring on another at the same time. Not a huge deal, just curious

Thank you again!

lukaszliniewicz commented 1 month ago

I will have to look into it, I've never done anything with multiple GPUs. I have to check if coqui TTS supports it.

JohnF51 commented 3 weeks ago

Sorry for off-topic, please would it be possible to add keyboard shortcuts for PLAY, Regenerate, Mark and Play as playlist to the script? It would be very useful for me.

lukaszliniewicz commented 3 weeks ago

There is one so far - the m key marks a sentence and the right mouse button marks the currently playing and the previous sentence (intended for using when listening but not looking at the interface). I will add the others, sure.

Tenidus commented 3 weeks ago

So I figured out a little work around for this. I have the entire source file/text Generated in a 'narrator' voice, then I go and mark the alternate speaking lines, switch the XTTS Model, Speaker Voice and RVC Voice to a different speakers voice and then Regenerate All. I do this for each speaker and it seems to work very well. In order to get the sentences properly segmented I separate them into paragraphs, even if it's a single word, the text is on it's own paragraph.

lukaszliniewicz commented 3 weeks ago

Good idea, but a lot of work, unfortunately... I'm still working on a robust solution, and it will probably take another two weeks or so. Thanks for the update.