rsxdalv / tts-generation-webui

TTS Generation Web UI (Bark, MusicGen + AudioGen, Tortoise, RVC, Vocos, Demucs, SeamlessM4T, MAGNet, StyleTTS2, MMS, Stable Audio, Mars5, F5-TTS, ParlerTTS)
https://rsxdalv.github.io/tts-generation-webui/
MIT License
1.82k stars 199 forks source link

HF-Cache and Model Problems #353

Closed Sp4wnf3rk3l closed 3 months ago

Sp4wnf3rk3l commented 3 months ago

Hey Mate :D

here I am with 2 new issues.

  1. Everytime a load a model in and generate with CVVP the first time, it errors out and says cvvp.pth is missing and the download failed. When I try again it works and the cvvp.pth is downloaded into a blobs folder inside the model folder. This happens everytime I load a model in despite the file was downloaded before. I've set the hf cache folder, but that doesn't change anything.

  2. When I try other models with Tortoise TTS the pronounciation won't change. So i use this german model for example.

https://huggingface.co/AOLCDROM/Tortoise-TTS-de

Set the tokenizer, use basic cleaners and choose a voice that comes with it, but tortoise is clearly trying to speak english :D I did experience the same thing with an Melina Model from Elden Ring. It seemed only the voice was taken into account but not the model. I noticed however that the console shows the autoregressive.pth gets loaded also even if the other chosen model is used in generation... but i dont know if it is lying to me :D

Hope I don't bother you by now.

rsxdalv commented 3 months ago

Thanks for linking the model, I will try to recreate this. Just to ask - did you load the model with the button "apply and load model" after selecting it from the drop-down?

On Sun, Jul 28, 2024, 9:02 AM Sp4wnf3rk3l @.***> wrote:

Hey Mate :D

here I am with 2 new issues.

1.

Everytime a load a model in and generate with CVVP the first time, it errors out and says cvvp.pth is missing and the download failed. When I try again it works and the cvvp.pth is downloaded into a blobs folder inside the model folder. This happens everytime I load a model in despite the file was downloaded before. I've set the hf cache folder, but that doesn't change anything. 2.

When I try other models with Tortoise TTS the pronounciation won't change. So i use this german model for example.

https://huggingface.co/AOLCDROM/Tortoise-TTS-de

Set the tokenizer, use basic cleaners and choose a voice that comes with it, but tortoise is clearly trying to speak english :D I did experience the same thing with an Melina Model from Elden Ring. It seemed only the voice was taken into account but not the model. I noticed however that the console shows the autoregressive.pth gets loaded also even if the other chosen model is used in generation... but i dont know if it is lying to me :D

Hope I don't bother you by now.

— Reply to this email directly, view it on GitHub https://github.com/rsxdalv/tts-generation-webui/issues/353, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABTRXI4GRY67KUNHNDGICYTZOSCPPAVCNFSM6AAAAABLSQDS2CVHI2DSMVQWIX3LMV43ASLTON2WKOZSGQZTGNZWG44TGNY . You are receiving this because you are subscribed to this thread.Message ID: @.***>

Sp4wnf3rk3l commented 3 months ago

Thx for quick reply.

Yes, i did.

rsxdalv commented 3 months ago

Got it. Could you also please send in the failed audio and the prompt for me to test?

On Sun, Jul 28, 2024, 9:09 AM Sp4wnf3rk3l @.***> wrote:

Thx for quick reply.

Yes, i did.

— Reply to this email directly, view it on GitHub https://github.com/rsxdalv/tts-generation-webui/issues/353#issuecomment-2254357554, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABTRXI6274IFSICKQIE3NPDZOSDHVAVCNFSM6AAAAABLSQDS2CVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENJUGM2TONJVGQ . You are receiving this because you commented.Message ID: @.***>

Sp4wnf3rk3l commented 3 months ago

Ha :D now (after just a restart) things are totally broken and i cant start up the app, i guess i have to reinstall completely, but nonetheless I've got the old output for you.

This is what Tortoise creates with the german dataset and the prompt "Ich weiss nicht was ich machen soll": https://vocaroo.com/18sYNHZdFUZ4

This is what it sounds if I type in something in english: https://vocaroo.com/14E2pdYotg7u

And this is what Bark Creates if you need an example that comes quite close: https://vocaroo.com/1mHeOH6HBsOZ

rsxdalv commented 3 months ago

Can you show the errors that you are getting now? Maybe I can make it more robust and fix it.

On Sun, Jul 28, 2024, 9:44 AM Sp4wnf3rk3l @.***> wrote:

Ha :D now (after just a restart) things are totally broken and i cant start up the app, i guess i have to reinstall completely, but nonetheless I've got the old output for you.

This is what Tortoise creates with the german dataset and the prompt "Ich weiss nicht was ich machen soll": https://vocaroo.com/18sYNHZdFUZ4

This is what it sounds if I type in something in english: https://vocaroo.com/14E2pdYotg7u

And this is what Bark Creates if you need an example that comes quite close: https://vocaroo.com/1mHeOH6HBsOZ

— Reply to this email directly, view it on GitHub https://github.com/rsxdalv/tts-generation-webui/issues/353#issuecomment-2254364830, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABTRXI4GJ6XTIO4J54DKPSTZOSHM3AVCNFSM6AAAAABLSQDS2CVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENJUGM3DIOBTGA . You are receiving this because you commented.Message ID: @.***>

Sp4wnf3rk3l commented 3 months ago

sure :D ... but so you know... I've got the RVC-WebUI and the AI-Voice-Cloining WebUI installed, maybe thats part of the problem.

Screenshot 2024-07-28 083117

rsxdalv commented 3 months ago

It seems like an issue with the config.

Are you using something other than 0.0.0.0 or 127.0.0.1?

Also, perhaps a different port than 80 might work. Windows does this unique thing where it "lets" two servers use the same port but does not actually let them do it. So maybe it's trying to connect to a completely different server. That's roughly what I'm seeing from these errors.

On Sun, Jul 28, 2024, 9:56 AM Sp4wnf3rk3l @.***> wrote:

sure :D ... but so you know... I've got the RVC-WebUI and the AI-Voice-Cloining WebUI installed, maybe thats part of the problem.

Screenshot.2024-07-28.083117.png (view on web) https://github.com/user-attachments/assets/958a7089-3153-44a6-bfe2-5d035b8dbab7

— Reply to this email directly, view it on GitHub https://github.com/rsxdalv/tts-generation-webui/issues/353#issuecomment-2254367303, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABTRXI3VPVMREALR55EMRGDZOSIYRAVCNFSM6AAAAABLSQDS2CVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENJUGM3DOMZQGM . You are receiving this because you commented.Message ID: @.***>

rsxdalv commented 3 months ago

To clarify - you can change it also with notepad in config.json If you break your config, you can just rename or delete it and it will be recreated

On Sun, Jul 28, 2024, 10:01 AM Roberts Slisans @.***> wrote:

It seems like an issue with the config.

Are you using something other than 0.0.0.0 or 127.0.0.1?

Also, perhaps a different port than 80 might work. Windows does this unique thing where it "lets" two servers use the same port but does not actually let them do it. So maybe it's trying to connect to a completely different server. That's roughly what I'm seeing from these errors.

On Sun, Jul 28, 2024, 9:56 AM Sp4wnf3rk3l @.***> wrote:

sure :D ... but so you know... I've got the RVC-WebUI and the AI-Voice-Cloining WebUI installed, maybe thats part of the problem.

Screenshot.2024-07-28.083117.png (view on web) https://github.com/user-attachments/assets/958a7089-3153-44a6-bfe2-5d035b8dbab7

— Reply to this email directly, view it on GitHub https://github.com/rsxdalv/tts-generation-webui/issues/353#issuecomment-2254367303, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABTRXI3VPVMREALR55EMRGDZOSIYRAVCNFSM6AAAAABLSQDS2CVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENJUGM3DOMZQGM . You are receiving this because you commented.Message ID: @.***>

Sp4wnf3rk3l commented 3 months ago

Well that was in fact an easy fix, I'm glad you are there^^

Nope I'm not using a different address on purpose, but the AI-Voice-Cloning WebUI uses the same. If one starts that one first and your UI afterwards, yours gets assigned to localhost:7861 instead to the normal 7860. I guess that tampered with the config in someway?!

rsxdalv commented 3 months ago

That's a little wild. I'm thinking about using a different port intentionally to avoid this. Also if you have questions like this feel free to join the discord.

On Sun, Jul 28, 2024, 10:10 AM Sp4wnf3rk3l @.***> wrote:

Well that was in fact an easy fix, I'm glad you are there^^

Nope I'm not using a different address on purpose, but the AI-Voice-Cloning WebUI uses the same. If one starts that one first and your UI afterwards, yours gets assigned to port 7861 instead to the normal 7860. I guess that tampered with the config in someway?!

— Reply to this email directly, view it on GitHub https://github.com/rsxdalv/tts-generation-webui/issues/353#issuecomment-2254371046, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABTRXIYEBMS3YJH3JHTL3H3ZOSKPVAVCNFSM6AAAAABLSQDS2CVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENJUGM3TCMBUGY . You are receiving this because you commented.Message ID: @.***>

Sp4wnf3rk3l commented 3 months ago

As I constantly have questions... I will join definetly!

By the way... here is a log of the issues I mentioned at first:

Screenshot 2024-07-28 091520

The CVVP model then gets downloaded into every single model folder... so i have it like 20 times in there:

Screenshot 2024-07-28 091744

rsxdalv commented 3 months ago

Ok, decision time - the original author of tortoise said that CVVP does not seem to matter and should not be used (same as setting to 0): here

I feel like people might be confused if I remove it, but I can leave it to be always 0 and add a comment. Or I can make an entire workaround to have it work again, though it does not sound productive.

Have you seen any good from the CVVP model? I'm also wondering if CVVP is multi-lingual.

Sp4wnf3rk3l commented 3 months ago

I personally did not notice a difference when using the default model tbh... but I've read somewhere that CVVP can help especially with other languages (but I've noticed the statement you linked also). Don't know if that's true, because tortoise won't talk my native language anyway :D

Guess you could leave a deprecation warning for a version or two, before you finally remove it^^

rsxdalv commented 3 months ago

The issue has been largely resolved - it was due to the models quality. CVVP will be removed, as the original creators of tortoise do not support it anymore. Tortoise needs better autoregressive.pth selection since now unnecessary models are being downloaded and the community doesn't use the models this way.