lukaszliniewicz / Pandrator

Turn PDFs and EPUBs into audiobooks, subtitles or videos into dubbed videos (including translation), and more. For free. Pandrator uses local models, notably XTTS, including voice-cloning (instant, RVC-enhanced, XTTS fine-tuning) and LLM processing. It aspires to be a user-friendly app with a GUI, an installer and all-in-one packages.
GNU Affero General Public License v3.0
348 stars 27 forks source link

Adding a .srt and pressing 'Generate dubbing audio' Hangs Pandrator #53

Closed luckyluca closed 1 month ago

luckyluca commented 1 month ago

When loading a .srt and pressing generate dubbing audio, Pandrator hangs without giving any error or message. I'm attaching the .srt as .txt (please rename it back to .srt) And the two installation logs

Any idea of what I might be doing wrong? Thanks L

P.s. Do you need to use the installer to launch the program? Just wondering as it's the only .exe and Pandrator.py doesn't launch by itself. MOTIVAZIONALE FITNESS.txt

pandrator_installation_log_20241016_200104.log pandrator_installation_log_20241016_215546.log

lukaszliniewicz commented 1 month ago

Thanks for the file. It works for me - audio is being generated. Could you please check the logs in Pandrator/Pandrator/logs?

You can launch Pandrator as a Python file from the terminal: python pandrator.py. The installer/launcher makes it easier since you can also launch the TTS server. Have you checked the "XTTS" box when launching Pandrator?

image

luckyluca commented 1 month ago

Thanks Lukas, I run the exact selection but it freezes all the time.

I tried several things: -launching Pandrator from the .exe -launching Pandrator from python -Downloading the full 30Gb package and running pandrator.py from there (Pandrator_XTTS_RVC_Whisper_Training_v025) -Downloading the full 30Gb package and running the exe installer and launching from there (Pandrator_XTTS_RVC_Whisper_Training_v025) -All of the above with and without xtts server running.

Every single time, no matter what, pressing the generate button hangs the interface. My guess is an interface problem with Pandrator UI.

P.s. What I'm after is generating a wav audio file from a supplied .srt file. What do I need exactly? Do I need to supply API keys? Also, updating Pandrator from the .exe interface doesn't work. Is this the right way: running python pandrator.py and also running python -m xtts_api_server (from the other folder)?

pandrator_20241017_154832.log

lukaszliniewicz commented 1 month ago

I don't know why, but it's the name of the file that is causing problems. When I change it, e.g. to "test.srt", it works fine. It's not clear to me why this happens. You only need API keyes if you want to translate subtitles. Otherwise it's not needed.

luckyluca commented 1 month ago

It worked, thank you! It's really cool!

Is there a way to burn the subtitles into the video? Also, is there a way for the audio to be as true as possible to the original .srt ?

Oh selecting voicecraft instead of XTTS freezes the same way as it used to do before.

lukaszliniewicz commented 1 month ago

You would have to install VoiceCraft first. And, to be honest, I don't recommend it. What do you mean by burning subtitles? Adding the generated audio to the video?

luckyluca commented 1 month ago

Thanks, I'll skip VoiceCraft and use XTTS then.

No, I meant to burn the .srt text over the video, but I can easily do that afterwards in Davinci Resolve :)

I ended up downloading the largest v25 package and using that as is. It works great this way, except that subdud doesn't work. Everytime that is invoked from Pandrator, Pandrator hangs and after a while I get an error like this one below: Screenshot 2024-10-18 191314

Also, is there a way to increase the resolution/quality of the voice? I'm currently using the default XTTS male voice, which is nice and natural, but it sounds low-res.

P.s. I'm going to file another issue about out of sync audio.

lukaszliniewicz commented 1 month ago

Have you installed WhisperX or downloaded the package with it? What do the logs say when it errors out like this?

As for the quality of the voice, there are two things you can do - use RVC or fine-tune XTTS. For RVC, you have to find a model you like. If you want to fine-tune XTTS, collect at least 5m of good, clear speech (from one speaker, no music, as little noise as possible) and use the tool in Pandrator. Optimally, you'd use at least 30m. If you have an audiobook you like, or can find something on Librivox, for example, you can use that.

luckyluca commented 1 month ago

yes, I downloaded the biggest zip and extracted it. And using the installer_launcher exe to launch Pandrator with xtts selected. It should be coming with whisperx and deepl already right?

I also uninstalled my local Python and any env reference to it.

With this extracted clean setup, transcribe and translate still hang andrator. Only 'generate dub' works. Here's the log.

Are you sure I don't need any other dependencies or local python to run subdud / whisperx / deepl / pandrator?

pandrator_20241019_130343.log

lukaszliniewicz commented 1 month ago

The problem was that the 10 series cards do not support float16 computation. I added a check and now int8 should be used for these cards. Please update Pandrator from the launcher and try again.

luckyluca commented 1 month ago

I added api keys for deepL and Anthropic. Selecting deepL now translates correctly (yay!). Maybe it was failing before because I have no credits on Anthropic.

So, selecting any Anthropic models hangs Pandrator. Selecting deepL works.

Is there a way to run the translate/transcribe locally? I can see a local option as well, but it hangs Pandrator.

lukaszliniewicz commented 1 month ago

Yes, you need to have Anthropic credits.

For local translation, you need to set up the Text Generation WebUI (see readme) and find a model that can perform translations for your language pair. Look at Huggingface for a 7-14b parameter instruct type gguf model that has been trained/fine-tuned on English and Italian, for example. I'm looking into adding support for madlad-400 and Alma, but it may take some time. If you need the translation to be top notch, use Anthropic Sonnet. If decent is enough, DeepL performs well.