rsxdalv / tts-generation-webui

TTS Generation Web UI (Bark, MusicGen + AudioGen, Tortoise, RVC, Vocos, Demucs, SeamlessM4T, MAGNet, StyleTTS2, MMS)
https://rsxdalv.github.io/tts-generation-webui/
MIT License
1.61k stars 173 forks source link

Tortoise Voice Cloning Error- No Checkpoints Downloading #182

Open Benzene82 opened 11 months ago

Benzene82 commented 11 months ago

I was hoping to get Tortoise TTS- Voice Cloning working. Text to Speech works fine in Bark and Tortoise, but I get the Error below trying to 'Apply Model Settings'. Sending generated Audio to RVC also opens the /checkpoints folder. I was sifting through GitHub looking for a source, or checkpoint. I see TTS doesn't really work like Stable Diffusion. I generated a few sentences, as you suggested in other issue posts, those models download but the TortoiseTTS model folder only has the ,gitkeep file. I watched all the videos in the (code) but those relate to RVC. Thank you for your time. Here is the ERROR log...

Running on local URL: http://0.0.0.0:7860/ Traceback (most recent call last): File "V:\ZTorTTSWebUI\installer_files\env\lib\site-packages\gradio\routes.py", line 437, in run_predict output = await app.get_blocks().process_api( File "V:\ZTorTTSWebUI\installer_files\env\lib\site-packages\gradio\blocks.py", line 1352, in process_api result = await self.call_function( File "V:\ZTorTTSWebUI\installer_files\env\lib\site-packages\gradio\blocks.py", line 1077, in call_function prediction = await anyio.to_thread.run_sync( File "V:\ZTorTTSWebUI\installer_files\env\lib\site-packages\anyio\to_thread.py", line 33, in run_sync return await get_asynclib().run_sync_in_worker_thread( File "V:\ZTorTTSWebUI\installer_files\env\lib\site-packages\anyio_backends_asyncio.py", line 877, in run_sync_in_worker_thread return await future File "V:\ZTorTTSWebUI\installer_files\env\lib\site-packages\anyio_backends_asyncio.py", line 807, in run result = context.run(func, *args) File "V:\ZTorTTSWebUI\tts-generation-webui\src\tortoise\gen_tortoise.py", line 49, in switch_model get_tts( File "V:\ZTorTTSWebUI\tts-generation-webui\src\tortoise\gen_tortoise.py", line 84, in get_tts MODEL = TextToSpeech( File "V:\ZTorTTSWebUI\installer_files\env\lib\site-packages\tortoise\api.py", line 233, in init self.tokenizer = VoiceBpeTokenizer( File "V:\ZTorTTSWebUI\installer_files\env\lib\site-packages\tortoise\utils\tokenizer.py", line 174, in init self.tokenizer = Tokenizer.from_file( Exception: stream did not contain valid UTF-8

rsxdalv commented 11 months ago

Unlike StableDiffusion where the "main" model is separated and single, Tortoise and others have multiple models working together. That's why it's an entire folder where you copy all of the models:

For me the place where they are downloaded is here: C:\Users\<user>\.cache\tortoise\models

and the files are:

Mode                 LastWriteTime         Length Name
----                 -------------         ------ ----
-a----         30-Apr-23   7:15 PM     1716988501 autoregressive.pth
-a----         30-Apr-23   7:17 PM      975620731 clvp2.pth
-a----         30-Apr-23   7:16 PM     1169472627 diffusion_decoder.pth
-a----         30-Apr-23   8:50 PM       25193729 rlg_auto.pth
-a----         30-Apr-23   8:50 PM      100715777 rlg_diffuser.pth
-a----         30-Apr-23   7:17 PM      391384715 vocoder.pth

I tested it and if I leave the tokenizer empty, it works. Could you share the tokenizer file if that could be the fault?

Benzene82 commented 11 months ago

Everything you listed is the same on my end. I'll admit I don't know how to use the program for what I want to do. I can generate Text to Speech in Bark and Tortoise. I can Clone a voice in Bark Voice Clone with below average results. I read and watched videos with much better Cloning using Tortoise. I don't really know what a Tokenizer is on the Tortoise TTS tab. If I understood how to use that tab, I think it has all the parameters and functionality that I am looking for. I'm trying Eleven Labs but I really don't want to be locked into their character limits and pricing. I appreciate your time and support. Thanks.

On Thu, Sep 21, 2023 at 12:55 AM Roberts Slisans @.***> wrote:

Unlike StableDiffusion where the "main" model is separated and single, Tortoise and others have multiple models working together. That's why it's an entire folder where you copy all of the models:

For me the place where they are downloaded is here: C:\Users.cache\tortoise\models

and the files are: Mode LastWriteTime Length Name

-a---- 30-Apr-23 7:15 PM 1716988501 autoregressive.pth -a---- 30-Apr-23 7:17 PM 975620731 clvp2.pth -a---- 30-Apr-23 7:16 PM 1169472627 diffusion_decoder.pth -a---- 30-Apr-23 8:50 PM 25193729 rlg_auto.pth -a---- 30-Apr-23 8:50 PM 100715777 rlg_diffuser.pth -a---- 30-Apr-23 7:17 PM 391384715 vocoder.pth

I tested it and if I leave the tokenizer empty, it works. Could you share the tokenizer file if that could be the fault?

— Reply to this email directly, view it on GitHub https://github.com/rsxdalv/tts-generation-webui/issues/182#issuecomment-1729045116, or unsubscribe https://github.com/notifications/unsubscribe-auth/ARSQ2XAOYEKM7VAC2NFPV4DX3PXF3ANCNFSM6AAAAAA466XCGI . You are receiving this because you authored the thread.Message ID: @.***>

rsxdalv commented 11 months ago

Ok, if you do English voice clones, you maybe fine with the default tokenizer and can ignore it for now.

Then, when you train a voice clone you might get an autoregressive.pth file. If you copy this file in a new folder under the tortoise models, you will be able to use it. Also, you will need to copy the other pth files as I mentioned above.

It seems like the video you watched was very specific to MRQ. But the actual model can be used without MRQ.