rsxdalv / tts-generation-webui

TTS Generation Web UI (Bark, MusicGen + AudioGen, Tortoise, RVC, Vocos, Demucs, SeamlessM4T, MAGNet, StyleTTS2, MMS)
https://rsxdalv.github.io/tts-generation-webui/
MIT License
1.46k stars 160 forks source link

Tortoise read.py - test large text files #314

Closed chlowden closed 1 week ago

chlowden commented 1 month ago

Hello I am trying to test how reasonably large text files are interpreted by tortoise. I noticed that there is command line available for Tortoise to breakup large texts into small chunks using the read.py file. I can see the file the tortoise install but not in your webgui version. Is there a way to have this function working in the webgui please? Many thanks for such a great interface.

rsxdalv commented 1 month ago

Hi, thanks for checking in, I'm guessing what you want is this to be in the webui: https://github.com/neonbjb/tortoise-tts/blob/572bdf3d2475f1a330bb074c6addf433f887b480/tortoise/utils/text.py#L4

Are you using React UI? It's far easier for me to add this function and to make it work seamlessly within that UI rather than the gradio.

chlowden commented 1 month ago

Hello, I am worked it out with the NEW REACT UI interface. I activated "Split prompt by lines" button and removed any strange pagination in the text. I found that having more than one return line break sign stopped the process. A single return line break sign helped slightly with intonation. It took me 8 hours using a RTX 3090 GPU at 100% and running very hot and noisy to do 8mins of narration. The result is 100 times better than anything else I have found and compares favorably to a similar production by Eleven Labs. The voice does go a little strange at some points, but that can be corrected as the system produces separate files for each line split so recalculating is easier than correcting files from Eleven Labs. Thank you so much for putting this UI together. It's fantastic.

rsxdalv commented 1 month ago

Hi, I'm happy that it worked out! As for the 8 hours - tortoise is very sensitive to the parameters you choose, low quality parameters are exponentially faster. I need to verify that the presets work properly because I noticed a bug before.

On Mon, May 13, 2024, 8:29 AM Christopher Lowden @.***> wrote:

Hello, I am worked it out with the NEW REACT UI interface. I activated "Split prompt by lines" button and removed any strange pagination in the text. I found that having more than one return line break sign stopped the process. A single return line break sign helped slightly with intonation. It took me 8 hours using a RTX 3090 GPU at 100% and running very hot and noisy to do 8mins of narration. The result is 100 times better than anything else I have found and compares favorably to a similar production by Eleven Labs. The voice does go a little strange at some points, but that can be corrected as the system produces separate files for each line split so recalculating is easier than correct files from Eleven Labs. Thank you so much for putting this UI together. It's fantastic.

— Reply to this email directly, view it on GitHub https://github.com/rsxdalv/tts-generation-webui/issues/314#issuecomment-2106685249, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABTRXI2G7WXKJ3CW5QTXFMLZCBFU7AVCNFSM6AAAAABHS3VCHWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMBWGY4DKMRUHE . You are receiving this because you commented.Message ID: @.***>

chlowden commented 1 month ago

Below is the setup I used "voice": "train_grace", "preset": "standard", "seed": "1715536858", "cvvp_amount": 0.0, "split_prompt": true, "num_autoregressive_samples": 256, "diffusion_iterations": 200, "temperature": 0.8, "length_penalty": 1.0, "repetition_penalty": 2.0, "top_p": 0.8, "max_mel_tokens": 500, "cond_free": true, "cond_free_k": 2, "diffusion_temperature": 1.0, "model": "Default", "name": ""}

rsxdalv commented 1 month ago

Fixed the presets, now if you change the preset it will actually update the values (https://github.com/rsxdalv/tts-generation-webui/pull/315). This won't speed up your previous attempt, but if you select a lower preset it will now work properly and increase speed but reduce quality.

chlowden commented 1 week ago

Thank you very much

rsxdalv commented 1 week ago

Added a button that automatically splits text into chunks of newlines. It might be a bit hit or miss, but hopefully it's useful: https://github.com/rsxdalv/tts-generation-webui/pull/322 localhost_3001_history_favorites